[Update: added link to the paper]
The short paper Duncan Dickinson and I put together for this conference is organised around the conference themes, and what our Research and Development group at the Australian Digital Futures Institute is doing about each of them. In this presentation I will pick out some of the the work we’re doing and some of the issues we’re thinking about, and try to relate this work back to the conference themes.
It’s gratifying to be able to make this presentation. Some of the core ideas we’re talking about here were the subject of a proposal I submitted for Open Repositories 2007, but it received mixed reviewer feedback and was relegated to a poster; my message that repositories were stuck in “Web 0.5” and needed to be made more webish was not timely.
Conference themes
Please tick these off as I go.
- The web and the repository & The cloud and the desktop*
- Knowledge and technology
- Wild and curated content
- Linked and isolated data & Ad-hoc and long-term access
- Disciplinary and institutional systems / Scholars and service providers
- Ubiquitous and personalized environments
* This is the big one!
So, what is this AWE thing?
AWE: The Academic Working Environment
Duncan: “A set of purposeful technologies brought together by standard interfaces for data exchange?”
Peter: “I needed a single cover-all name for all the different projects we were working on in our institute, so that I could try to report to the powers that be in a more efficient way.”
The Web and the Repositories
We’re on the web, but are we of the web?
Screenshot: HTML documents derived from Word documents

Screenshot: ICE conversion service

Screenshot: ICE conversion options for word processing

Screenshot: Annotations on a (this) document

Screenshot: Paquete stand-alone demo

Issue: What will happen when vertically controlled computing platforms are the norm?
Does the desktop/lab/home PC become a server? Or will it live in the cloud?
I brought an iPad with me on this trip, but I have found that I can’t put music on it or look at the books I have bought across four – yes, four – different reader applications using standard file-operations. Imagine the problems if we have to deal with valuable research data which lives on these controlled devices. A whole new era of format lock-in that makes Microsoft look like a free-software hippy.
This is one reason why the web, and delivering stuff in web formats is important.
Idea: How about ePUB as a repository packaging format?
(In my reading of the specs) ePUB is engineered for overloading:
A zip file containing:
- (At least) XHTML, with optional extra elements & a flat table of contents.
- Can include video, chemistry, whatever if you provide fallback image/text.
- Allows for alternate renditions such as PDF, docx or odt originals.
- We could include an HTML 5 version (using something like Paquete) for modern browsers/devices.
- Javascript is allowed – but must be ignored by ePUB readers but could be used by web apps.
Idea: ePUB use-cases
Repository: (SIP/AIP/DIP?)
- Serve the content using an in-page eReader (Paquete).
- Let the package handle package semantics (pre-print, published version, presentation), repository can continue to handle streams.
- Support viewers for data types for defined periods, such as JMOL for chemistry using something like ePUB’s fall-back mechanism and oEmbed.
Users:
- If all else fails: unzip the package and click ‘index.html’
- Use with eBook reader software/hardware.
Developer / repository manager:
- Add more packaging info – ORE, METS etc.
One of the other things were look at at ADFI is ways of bootstrapping the Linked-Data/Semantic Web. I (Peter) have proposed a way of embedding RDF statements in documents using simple interoperable URIs. Duncan has taken this work further with a system that can serve ‘proper’ RDF. The demo here shows this technique for metadata, but it could also be applied for other kinds of semantics, when you are talking about someone, for example rather than asserting that they are an author or an editor.
Idea: Making linked data, well, links
- Step 1: Approach your repository or ID provider, search for self
http://nla.gov.au/nla.party-541658
Note: wrapping this link around some text is essentially meaningless. It’s not the semantic web it’s the old-web.
- Step 2: Copy the link labelled “Assert Authorship”
http://ontologize.me/meta/?r=http://purl.org/dc/terms/creator&o=http://nla.gov.au/nla.party-541658
Word processor-proof linked data – part 2
Conclusion
We (like many others) are building a toolkit for web/repository construction. The key reasons we rolled our own are:
- We care about having web-resources not just PDF.
- We wanted to be able to deploy the application to the desktop (hence a Java app that can be deployed with Apache Solr).
Copyright Peter Sefton & Duncan Dickinson, 2010. Licensed under Creative Commons Attribution-Share Alike 2.5 Australia. <http://creativecommons.org/licenses/by-sa/2.5/au/>

This post was written in OpenOffice.org, using templates and tools provided by the Integrated Content Environment project and published to WordPress using The Fascinator.
I’ve played with e-readers and e-pub enough to be 101% convinced that we need a reflowable object format to allow the document to be formatted on the fly for display in a device appropriate manner (try reading a 2 column pdf on any cheap e-reader if you’re not convinced).
Extending e-pub to allow incorporation of appropriate collateral (sound, video) makes eminent sense as documents are heading towards a compound multi-modal future eg text of paper plus sound of conference presentation plus accompanying powerpoint – just like lecture capture systems.
Embedding authorship and provenance information as part of the payload also makes sense and eases ingest into things like electronic study bricks etc
Yes, I think ePub needs to be the format of publishing. Reflowable, single units for archiving, ‘of’ the web but portable for offline. Plus it’s HTML+CSS — stuff we (should) know how to work with.
Anyway, if you’re not talking with ThreePress, you might want to:
http://threepress.org