ptsefton.github.io

This is part-one in a series of posts about preserving my two theses for posterity and making them available for open access. One is BA Hons thesis, and the other a PhD. I'm sure they're both safe in the library at the University of Sydney in paper form. Actually I'm not sure – just assuming. (The title of this post is a bit of a play on the phrase '[Self archiving](http://www.eprints.org/openaccess/self-faq/)', used by [Open Access](http://en.wikipedia.org/wiki/Open_access) activists.) Anyway, with the ICE for Research and Scholarship project starting up ([ICE-RS](http://ice.usq.edu.au/introduction/ice_rs.htm) ), I thought it was time to make an example of myself, and put my own stuff into (a) ICE, and (b) the appropriate repository. This is part one of the process, so the appropriate repository is a way off, but I assume it's [the one at the University of Sydney.](http://setis.library.usyd.edu.au/ses/about.html)I am looking forward to filling out this [form](http://setis.library.usyd.edu.au/ses/index.php). The first stage of this process is to get the documents into modern standards-based formats. Later I'll look into [PREMIS](http://www.oclc.org/research/projects/pmwg/). The [ICE-RS](http://ice.usq.edu.au/introduction/ice_rs.htm) project is about connecting repositories like the [Sydney eScholarship Repository](http://setis.library.usyd.edu.au/ses/about.html) to the authoring process that actual scholars go through so they don't have to go through the process I'm going through now. ICE wasn't around when I did my honours year or PhD, but I have always had a bit of a thing for orderly word processing, in contrast to the rest of my life. This means that both documents use styles,. Give or take a few challenges I'll talk about another day the text of the documents is really easy to extract. The pictures, on the other hand caused some problems. At this point I should confess that if all I wanted to do was produce the standard PDF files of my theses and send them off to the repository I could be relaxing by the pool with a banana daiquiri by now. Only we don't have a pool, we have a once-in-a-1000-year drought and bananas are still around \$10/kg. Anyway, I wanted to test out ICE and get versions of my thesis that are more preservation-ready than a mere PDF. So I'm going to aim for getting PDF (both a book and chapter by chapter), and nice HTML all in an IMS package. I may even try for a DocBook version, using [the code that Ian Barnes wrote](http://www.apsr.edu.au/publications/preservation_of_word_processing_documents.html) to convert ICE documents to DocBook. Squeamish readers might like to click-off [somewhere else](http://www.sanrio.com/) now. # Getting the stuff What are we dealing with here? A B.A. Honours thesis from 1990, and a PhD fron 1994/5. Both born digital. I'm getting hazy about the details, but the two theses in question were written respectively in Microsoft Word on a Mac Plus, with no hard disc and a Mac PowerBook 160 with an enormous 40MB hard disc. The files all ended up on a 100MB Zip disk which fortunately has stayed with me through five house moves. I gave the Zip drive to friend years ago, along with the SCSI card. Made a half-hearted attempt to rescue the files circa 2003, and only got serious week-before-last. Finding a Mac with a Zip drive was surprisingly easy 'cos our RUBRIC office was located next door to the multimedia people, who had big blue Mac running OS X stashed in a locked office, just in case. We've since moved to the library, which is full of actual books, and people pretending to study while they listen to their iPods. Good luck finding an ancient Mac in there. Things went well at first, I stuffed all three Zip discs I had into the Mac in turn (the labels were less than informative) and found the one with a full backup of the hard disk from the PowerBook 160. It was nicely organized into folders. Tried to copy the backup to a USB drive. It worked ... ... right up until the part where I got an error to do with file names. After some mucking around I gave up and zipped the backup then copied it to the USB. Back at my desk, using the ultra-modern black MacBook (Intel inside!) I started trying to open stuff up. Memories came flooding back as I retrieved bits of half-written novel and stuff I wrote for university papers. (Actually, this was only after I had to call the campus Mac guru and get him to put Microsoft Office on my Mac (again, good luck if you don't work in a friendly little university like USQ where you can just call someone and make this stuff happen)). But a lot of the files wouldn't open at all, including those for the all-important theses. I tried lots of tricks, but the most productive was looking a the files the plain-old `less `command– to see the text. There was something near the top of the un-openable files that mentioned Stuffit, the compression utility. Dragging them onto a modern version of Stuffit did nothing, so I searched and searched for information and/or a usable version of Stuffit. The oldest download I could find didn't like the files either, so I went back to the old Zip-enabled Mac. Dim memories of bits of software that helped you squeeze more stuff onto tiny disks began to surface. I poked around my zip disc and found the archive I had made of all the applications on my PowerBook. One of them was called Disk Doubler. Note : If I hadn't saved the programs along with the data and found a computer that could run them off the obsolete media I would have lost it. Took me a while to figure out how to turn on 'Classic' the old-style Mac emulator via System Preferences, but when I did so, and fired up Disk Doubler it was able to unpack the heretofore unusable files from my theses. I spent a long time playing with the thing until I figured out how to do large batches of files, but in the end I had readable Word files. (This can't be done on new Macs – only older ones) Success! Well, sort of. The PhD has a really weird problem with headings which I'll go through another day, but the honours looks great. If all I wanted was a bunch of PDF files for the chapters I'd be relaxing by the non-existent pool by now. But when I open the docs in Word and re-saved them in modern Word format it turned out that OpenOffice, and thus ICE didn't like them – images were garbled. Strangely I had better luck on a Windows machine, more of that in Part 2. A major disappointment is that a folder full of home-made song lyrics is still unreadable; there are songs in there of great significance, one for a wedding, half remembered and completely forgotten songs, and songs performed on the same bill as some of the very first [Whitlams](http://en.wikipedia.org/wiki/The_Whitlams) gigs at the Sandringham Hotel by a band that you will not find in the Wikipedia. # Summary I think that if I'd left this project for another few years it would have been a lost cause: easier to scan the paper copies than to try to rescue the digital copies. The above process took one day – and I got some other things done too, but I was very, very lucky to have all the tools on hand. Bottom line is, if you've got old media with important stuff the time to act is now, before the people who can help all drop off the twig and the computers are all gone. Next installment, the story of obsolete PICT files and really inconsistent image rendering, as I put together an ICE version of my honours thesis.