At a recent RUBRIC meeting we discussed the idea that citation and copyright details need to be on the actual documents in a repository. Copyright statements in metadata don’t mean much once the document has been downloaded and is ‘in the wild’, lost amongst 3000 documents called stuff like item_8593.pdf, most likely.

How about an automated way of summarizing document metadata and the relevant rights statement and adding it as a cover page to a PDF (or HTML document) in a repository? That way the user will be able to see what they’ve got when it’s kicking around their hard disk. This should be easy enough to automate. Maybe it would make a good ARROW mini-project (strangely there’s a text doc about these grants) if a developer wants to take it on.

A related idea is to store document metadata in the XMP metadata stream in a PDF. This was covered in what seems to be an unpublished paper (Howison 2004) which pointed out that an iTunes like approach to managing PDFs would be very helpful. The Zoteroresearch application should be able to do become iTunes for PDFs by recognizing and/or embedding metadata into your research library but as far as I can tell that’s not yet a feature.


comments powered by Disqus