[ptsefton.com] | [CV & Bio]

General purpose document formats, again

2004-12-01

Emacs-addicted geek Jon Udell, who is not to be dismissed lightly has returned to the topic of open document formats vs custom schemas and where XHTML fits in. I have been working with the software development team at USQ's Distance and e-Learning Centre on this issue, and we are looking at bridging the gap between generic/general purpose and custom formats.

Jon revisits Tim Bray's piece suggesting that XHTML might be a better choice than a custom XML dialect in many situations:

No, I'm not saying that everyone should use XHTML or the OpenOffice.org formats for every document in the world. But I do think that the cost of rolling your own is a lot higher than you think, and you should really try to avoid doing that if you possibly can.

Jon's conclusion:

I'm not ready to go along with the other conclusion he reaches in that posting -- that custom schemas are a red herring. But I agree that XHTML is more valuable than most people think. For the vast majority of useful documents, it can have as much structure as we need, and for the rest it can be extended internally with namespaced inclusions. But the real power arises from its hypertextual nature. For me, increasingly, there is no office, and there is no desktop, there is only a network of linked documents. A successful open document format will have to be supremely well-adapted to that environment, as XHTML is.

I have written on this matter numerous times and started the slow-moving Word Processor Interoperability project (which does at least have a template to download now) to look into a practical implementation of Tim Bray's idea.

I am now investigating this in new context. At USQ there is a custom schema for courseware for print, web and CD delivery, known as GOOD (Jacek Radajewski will present a poster on GOOD at the Ninth Australasian Document Computing Symposium (ADCS 2004), December 13 in Melbourne).

We are considering doing both: maintaining the custom schema at the back end, with a more flexible front-end system for creating content. Obviously, a generic general-purpose document format is more permissive than the back-end GOOD, so there will be some intervention to get documents from the authoring system 'up the hill' into GOOD. I think we have a few tricks that will minimize the manual conversion process, though.

The GOOD system is a custom schema (expressed as a DTD) - with chapter-level elements that structure the overall flow of course materials in a standard way across a large corpus of offerings, and a content model that is appropriate for educational content. It also has an assessment module that can drive both print and computerized quizzing with hooks into the University grade-book at the back end, in PeopleSoft.

GOOD is now used in the Distance and e-Learning Centre, and by a some course leaders. It produces print, web and CD offerings automatically from the XML source. To get it into wider use I have been asked to look at an easy-to-approach, cross-platform editing system. One thing that we are exploring is to use XHTML as the basis of an easier-to edit format, with the target remaining the GOOD system for production.

It is early days yet, but these are the ingredients we are considering for the new system (and we will be seeking further input and pilot users from the university):

It is important to remember that at the low level of granularity there would be a relatively free form 'canvas' to draw on, using something not too far from XHTML, with its generic heading, table and list elements, but with domain specific extensions for things like activities, or learning outcomes. The extensions would be implemented in the Word processor using the least-ugly hacks available. At a higher level, there would be a schema-like content system that would stitch the smaller granules together according to domain specific rules, so you would be able to specify the top level structure of a course and have it drive the content management system; at least that level would be guaranteed to comply with the house schema.

I know that such a system, with high-level schematization that ties together loosely structured granules can work well, as we used it in NextEd's Continuous Publishing System (CPS), described here in this poster paper for AUSWEB04, but the CPS was not trying to map to a complex schema. How much human intervention will be required to map word processor-produced or edited content, where it is virtually impossible to enforce constraints on structure, back to the GOOD structure is an open question; we will have to find out from trials.

$LastChangedDate: 2004-12-01 01:29:05 -0600 (Wed, 01 Dec 2004) $ $Rev: 108 $