ptsefton.github.io

At a recent RUBRIC meeting we discussed the idea that citation and copyright details need to be **on the actual documents** in a repository. Copyright statements in metadata don't mean much once the document has been downloaded and is 'in the wild', lost amongst 3000 documents called stuff like `item_8593.pdf`, most likely. How about an automated way of summarizing document metadata and the relevant rights statement and adding it as a cover page to a PDF (or HTML document) in a repository? That way the user will be able to see what they've got when it's kicking around their hard disk. This should be easy enough to automate. Maybe it would make a good ARROW mini-project (strangely there's a [text doc](http://www.arrow.edu.au/docs/files/ARROW_Mini-Projects_Scheme.txt) about these grants) if a developer wants to take it on. A related idea is to store document metadata in the XMP metadata stream in a PDF. This was covered in what seems to be an unpublished paper ([Howison 2004](http://freelancepropaganda.com/archives/MP3vPDF.pdf)) which pointed out that an iTunes like approach to managing PDFs would be very helpful. The [Zotero](http://www.zotero.org/)research application should be able to do become iTunes for PDFs by recognizing and/or embedding metadata into your research library but as far as I can tell that's not yet a feature.