[Update: Fixed a typo and added two more names] I’m in paper-writing mode at the moment which means doing some actual reading-type research. I’m interested in issues like why repositories use PDF and not HTML.
It’s surprising how many papers there are on ‘Library 2.0’ or ‘Repositories and Web 2.0’ that manage to not mention HTML at all, or discuss file formats in technical terms but neglect to talk about what people actually use to write up their research.
“XML is good and Microsoft Word is bad” is a fairly common analysis. Great. What should I do next?
Found one lovely example of a paper reluctantly defending PDF, yet its own title is mangled in Google Scholar presumably cos it is only available as PDF.
I’m having trouble finding much about the following:
Actual data on what authoring tools people use – we have plenty of anecdotal data that most authors use Word, with pockets of LaTeX in tech disciplines.
Prior-art in embedding metadata in word procesing documents, inline, in-text, using styles. Found some really ambitious stuff about Web 2.0 but not much on the more modest goal of being able to identify author’s names reliably.
Does anyone have any sources to get me started? There may be pockets of research I’m not finding through Google Scholar?
Dorothea Salo, you commented here recently and you know a lot about this stuff…
Andrew Treloar, anything come out of DART/ARCHER on this?
Ian Barnes you’ve looked – any sources?
Chris Blackall I know you worry about these things.
Susan Gibbons & team have looked at this and Nathan Sarr is implementing (has implemented?) some tools
Comments are open but I may not get to moderate them over the weekend, so be patient.
Here’s the little collection I have tagged in Zotero so far (lots of detail missing):
Barnes, I., 2006. Preservation of word-processing documents. Retrieved October, 2, p.2006.
Eriksson, H., 2007. The semantic-document approach to combining documents and ontologies. International Journal of Human-Computer Studies, 65(7), p.624-639.
Sally Murray, 2008. Open science, open access and open source software at Open Medicine. Available at: http://www.openmedicine.ca/article/viewArticle/205/104 [Accessed February 22, 2008]. [Grabbed this cos it refers to Lemon8XML]
Tallis, M., Semantic Word Processing for Content Authors.
Witten, I.H. et al., 2002. Importing Documents and Metadata into Digital Libraries: Requirements Analysis and an Extensible Architecture. Proceedings of the European Conference on Digital Libraries, p.390–405.