Anyone know about research into what authoring tools academics use?
I’m in paper-writing mode at the moment which means doing some actual reading-type research. I’m interested in issues like why repositories use PDF and not HTML.
It’s surprising how many papers there are on ‘Library 2.0′ or ‘Repositories and Web 2.0′ that manage to not mention HTML at all, or discuss file formats in technical terms but neglect to talk about what people actually use to write up their research.
“XML is good and Microsoft Word is bad” is a fairly common analysis. Great. What should I do next?
Found one lovely example of a paper reluctantly defending PDF, yet its own title is mangled in Google Scholar presumably cos it is only available as PDF.
I’m having trouble finding much about the following:
Actual data on what authoring tools people use – we have plenty of anecdotal data that most authors use Word, with pockets of LaTeX in tech disciplines.
Prior-art in embedding metadata in word procesing documents, inline, in-text, using styles. Found some really ambitious stuff about Web 2.0 but not much on the more modest goal of being able to identify author’s names reliably.
Does anyone have any sources to get me started? There may be pockets of research I’m not finding through Google Scholar?
Dorothea Salo, you commented here recently and you know a lot about this stuff…
Andrew Treloar, anything come out of DART/ARCHER on this?
Ian Barnes you’ve looked – any sources?
Chris Blackall I know you worry about these things.
Susan Gibbons & team have looked at this and Nathan Sarr is implementing (has implemented?) some tools
Comments are open but I may not get to moderate them over the weekend, so be patient.
Here’s the little collection I have tagged in Zotero so far (lots of detail missing):
Barnes, I., 2006. Preservation of word-processing documents. Retrieved October, 2, p.2006.
Eriksson, H., 2007. The semantic-document approach to combining documents and ontologies. International Journal of Human-Computer Studies, 65(7), p.624-639.
Sally Murray, 2008. Open science, open access and open source software at Open Medicine. Available at: http://www.openmedicine.ca/article/viewArticle/205/104 [Accessed February 22, 2008]. [Grabbed this cos it refers to Lemon8XML]
Tallis, M., Semantic Word Processing for Content Authors.
Witten, I.H. et al., 2002. Importing Documents and Metadata into Digital Libraries: Requirements Analysis and an Extensible Architecture. Proceedings of the European Conference on Digital Libraries, p.390–405.
Sorry, I’ve had this marked to respond to for AGES and never found a round tuit. (Just don’t ask how far behind I am on email.)
You’re right. The literature does not address this question. I only know what I know from experience at a small editing-and-typesetting-and-SGML service bureau in the late ’90s. When we got electronic documents, they were Word or occasionally LaTeX (which could be truly annoying because authors wouldn’t send their mods along so we couldn’t reconstruct their doc the way they intended).
Our internal processes — even the ones with SGML components — were based on Word as an editing and communicating-with-authors format. We were Penta-based, so we could straight-up typeset SGML. When we did that, though, the doc came out of editing in well-templated Word, then got run through a VB wringer and a lot of regular expressions to make almost-SGML, then the inevitable few handfixes (I got better at scripting these away eventually) and in the SGML went to Penta.
A way to get at this question sideways might be to survey journal submission requirements. Some say they accept Word/TeX/HTML/whatever, but my sense is a pretty hefty majority insist on Word. Those that actually offer a template are almost entirely Word. In the last couple years I’ve written a book chapter, half a book, and two articles, and I’m on tap for a third article shortly. All asked for Word.
Comment by Dorothea Salo — 2008-05-13 @ 12:06 am