This is part-one in a series of posts about preserving my two theses for
posterity and making them available for open access. One is BA Hons
thesis, and the other a PhD. I'm sure they're both safe in the library
at the University of Sydney in paper form. Actually I'm not sure – just
assuming.
(The title of this post is a bit of a play on the phrase '[Self
archiving](http://www.eprints.org/openaccess/self-faq/)', used by [Open
Access](http://en.wikipedia.org/wiki/Open_access) activists.)
Anyway, with the ICE for Research and Scholarship project starting up
([ICE-RS](http://ice.usq.edu.au/introduction/ice_rs.htm) ), I thought it
was time to make an example of myself, and put my own stuff into (a)
ICE, and (b) the appropriate repository. This is part one of the
process, so the appropriate repository is a way off, but I assume it's
[the one at the University of
Sydney.](http://setis.library.usyd.edu.au/ses/about.html)I am looking
forward to filling out this
[form](http://setis.library.usyd.edu.au/ses/index.php). The first stage
of this process is to get the documents into modern standards-based
formats. Later I'll look into
[PREMIS](http://www.oclc.org/research/projects/pmwg/).
The [ICE-RS](http://ice.usq.edu.au/introduction/ice_rs.htm) project is
about connecting repositories like the [Sydney eScholarship
Repository](http://setis.library.usyd.edu.au/ses/about.html) to the
authoring process that actual scholars go through so they don't have to
go through the process I'm going through now.
ICE wasn't around when I did my honours year or PhD, but I have always
had a bit of a thing for orderly word processing, in contrast to the
rest of my life. This means that both documents use styles,. Give or
take a few challenges I'll talk about another day the text of the
documents is really easy to extract. The pictures, on the other hand
caused some problems.
At this point I should confess that if all I wanted to do was produce
the standard PDF files of my theses and send them off to the repository
I could be relaxing by the pool with a banana daiquiri by now. Only we
don't have a pool, we have a once-in-a-1000-year drought and bananas are
still around \$10/kg. Anyway, I wanted to test out ICE and get versions
of my thesis that are more preservation-ready than a mere PDF. So I'm
going to aim for getting PDF (both a book and chapter by chapter), and
nice HTML all in an IMS package. I may even try for a DocBook version,
using [the code that Ian Barnes
wrote](http://www.apsr.edu.au/publications/preservation_of_word_processing_documents.html)
to convert ICE documents to DocBook.
Squeamish readers might like to click-off [somewhere
else](http://www.sanrio.com/) now.
# Getting the stuff
What are we dealing with here? A B.A. Honours thesis from 1990, and a
PhD fron 1994/5. Both born digital.
I'm getting hazy about the details, but the two theses in question were
written respectively in Microsoft Word on a Mac Plus, with no hard disc
and a Mac PowerBook 160 with an enormous 40MB hard disc. The files all
ended up on a 100MB Zip disk which fortunately has stayed with me
through five house moves. I gave the Zip drive to friend years ago,
along with the SCSI card. Made a half-hearted attempt to rescue the
files circa 2003, and only got serious week-before-last.
Finding a Mac with a Zip drive was surprisingly easy 'cos our RUBRIC
office was located next door to the multimedia people, who had big blue
Mac running OS X stashed in a locked office, just in case. We've since
moved to the library, which is full of actual books, and people
pretending to study while they listen to their iPods. Good luck finding
an ancient Mac in there.
Things went well at first, I stuffed all three Zip discs I had into the
Mac in turn (the labels were less than informative) and found the one
with a full backup of the hard disk from the PowerBook 160. It was
nicely organized into folders.
Tried to copy the backup to a USB drive.
It worked ...
... right up until the part where I got an error to do with file names.
After some mucking around I gave up and zipped the backup then copied it
to the USB.
Back at my desk, using the ultra-modern black MacBook (Intel inside!) I
started trying to open stuff up. Memories came flooding back as I
retrieved bits of half-written novel and stuff I wrote for university
papers.
(Actually, this was only after I had to call the campus Mac guru and get
him to put Microsoft Office on my Mac (again, good luck if you don't
work in a friendly little university like USQ where you can just call
someone and make this stuff happen)).
But a lot of the files wouldn't open at all, including those for the
all-important theses. I tried lots of tricks, but the most productive
was looking a the files the plain-old `less `command– to see the text.
There was something near the top of the un-openable files that mentioned
Stuffit, the compression utility. Dragging them onto a modern version of
Stuffit did nothing, so I searched and searched for information and/or a
usable version of Stuffit. The oldest download I could find didn't like
the files either, so I went back to the old Zip-enabled Mac.
Dim memories of bits of software that helped you squeeze more stuff onto
tiny disks began to surface. I poked around my zip disc and found the
archive I had made of all the applications on my PowerBook. One of them
was called Disk Doubler.
Note
: If I hadn't saved the programs along with the data and found a
computer that could run them off the obsolete media I would have
lost it.
Took me a while to figure out how to turn on 'Classic' the old-style Mac
emulator via System Preferences, but when I did so, and fired up Disk
Doubler it was able to unpack the heretofore unusable files from my
theses. I spent a long time playing with the thing until I figured out
how to do large batches of files, but in the end I had readable Word
files. (This can't be done on new Macs – only older ones)
Success!
Well, sort of. The PhD has a really weird problem with headings which
I'll go through another day, but the honours looks great. If all I
wanted was a bunch of PDF files for the chapters I'd be relaxing by the
non-existent pool by now. But when I open the docs in Word and re-saved
them in modern Word format it turned out that OpenOffice, and thus ICE
didn't like them – images were garbled. Strangely I had better luck on a
Windows machine, more of that in Part 2.
A major disappointment is that a folder full of home-made song lyrics is
still unreadable; there are songs in there of great significance, one
for a wedding, half remembered and completely forgotten songs, and songs
performed on the same bill as some of the very first
[Whitlams](http://en.wikipedia.org/wiki/The_Whitlams) gigs at the
Sandringham Hotel by a band that you will not find in the Wikipedia.
# Summary
I think that if I'd left this project for another few years it would
have been a lost cause: easier to scan the paper copies than to try to
rescue the digital copies.
The above process took one day – and I got some other things done too,
but I was very, very lucky to have all the tools on hand.
Bottom line is, if you've got old media with important stuff the time to
act is now, before the people who can help all drop off the twig and the
computers are all gone.
Next installment, the story of obsolete PICT files and really
inconsistent image rendering, as I put together an ICE version of my
honours thesis.