Rick Jelliffe writes of his ongoing frustration with Microsoft Word's lack of openness (this is not a negative post from Rick, though):
I welcome both [OpenDocument format and Mirosoft OpenXML] but have a big eyeroll thinking of the twenty years of missed opportunities which Microsoft has cheated its users out of by not providing an XML interface until recently: I remember trying out their appalling SGML Author for Word more than a decade ago and wishing they just had a simple mini-SGML version of RTF instead, like the Rainbow DTD. I hope the “Open” in “Open XML” refers to a change of thinking in MicroSoft management in favour of agressive interoperability.)
A big eyeroll from me too. I too dealt with the terrible SGML Author thing.
I'd like to go on about one particular missed opportunity in the history of Word. There was the 'almost but not quite XML' nearly full-fidelity round-trip export feature in Word 2000. I seem to remember that this feature came after some mentions that Word 2000 was going to have XML export; which in the end it didn't.
This round-trip Save as HTML... feature uses an unholy mishmash of HTML and islands of XML which is deliberately broken in all sorts of weird ways. But going on a lead from something Jon Udell wrote, I worked out that you could transform it into 'real' well-formed XML to get something very like the Rainbow format to which Rick alludes; a useful document interchange format.
I gave the idea to the team from TeraText, because I wanted to use it on the project we had running at Standards Australia. They built it in to their Ace scripting language (now there's another missed opportunity. That was by far the most complete and comfortable XML / SGML programming language I have used until .NET and C# came along – you can still get yourself a free (beer) copy).
I wrote an article about this for xml.com, complete with a little Python / XSLT script to do the transform. Problem is, I wrote it at least four years too late. Now there's Word 2003's XML format and the new OpenXML formats to look forward to, not to mention using OpenDocument, via OpenOffice.org, which is what we do in the ICE application.
I think this was a missed opportunity for the XML publishing community – I wish I'd been more vocal in promoting the idea beyond my own small circle, because we could have had ourselves a lot more Word mashups a lot sooner had the technique been better known and it might have broken Word's formats open sooner.
I have always wondered what went on in Microsoft in 1999. The export format they produced was this close to being XML. I have always imagined that somewhere in the lab it was well formed XML, and the poor old programmers were made to dumb it down. I like to think that someone inside Microsoft was trying to make Word as open as possible and it was up to us, as a community to find the crack they left for us to crawl through.
Now I think it is the OpenDocument crowd who are largely responsible for the new openness at Microsoft because they provided an alternative to Microsoft's ubiquitous but largely closed and/or lame systems and forced Microsoft to compete.
Anyway – If I had to work with Word right now I would either:
Use OpenOffice.org to convert documents via the ICE code.
use Word 2000's save as HTML and transform it to XML.
And kids, don't forget to use styles.