ptsefton.github.io

Via [Brian Jones](https://rubric-central.usq.edu.au/projects/trac/rubric/timeline?ticket=on&wiki=on&max=50&daysback=90&format=rss), who writes about the new Office XML formats for Microsoft, I welcome the news from Joe Friend that there will be built-in[blogging in Word 2007](http://blogs.msdn.com/joe_friend/archive/2006/05/12/595963.aspx). This is good news not so much for the blogging bit but for the way that Word will be able to make clean HTML from styles. Joe Friend only [mentions](http://blogs.msdn.com/joe_friend/archive/2006/05/12/595963.aspx) a couple of styles (h1 and, I assume, quotes, or does he mean any paragraph enclosed in quote marks and indented?): > Go ahead, click View, Source in your browser and look at the HTML > starting with "Word is a great tool..." We really are going pretty > basic here. Bold become \<strong\>, Italic becomes \<em\>, Heading 1 > become \<h1\>, Quotes become \<blockquote\> and on it goes. There are > definitely kinks in Beta 2. For example we are encoding smart quotes > incorrectly so I had to turn off that feature in Word, but the goal is > to output just what is needed to make your blog post clean and > readable (code and rendered HTML). That's fine, but what about lists, and pre-formatted text embedded in quotes and so on? (And actually I think bold should map to \<b\>, or nothing, and you should use a style called 'strong' if that's what you want). Well at the ICE project we have developed a stylesheet that can drive clean HTML output, and we have templates for both Word and OpenOffice.org – so I can post to this blog from Microsoft Word or any OpenDocument aware application, like OpenOffice.org. I have covered this in a number of previous posts. So look [here](http://ptsefton.com/blog/2005/09/18) for an example, and [here](http://ptsefton.com/blog/2005/09/05/blog_this_button_for_openoffice.org_(well_half_anyway)) for some stuff about the ICE approach to blogging from a word processor, and in the [pre-print of the paper](http://eprints.usq.edu.au/archive/00000697/) I'll be giving on ICE at Ausweb 06 for some more detail about how the mapping works. I'll quote that paper here: > The core styles are listed below.
Family Type Style names Paragraph (p) p Heading (h) h1 h2 h3 h4 h5 Heading (h) Numbered (number) h1n h2n h3n h4n h5n List item (li) Numbered number) li1n li2n li3n li4n li5n List item (li) Bullet (bullet) li1b li2b li3b li4b li5b List item (li) Uppercase Alpha (A) li1A li2A li3A li4A li5A List item (li) Lowercase Alpha (a) li1a li2a li3a li4a li5a List item (li) Lowercase Roman (i) li1i li2i li3i li4i li5i List item (li) Lowercase Roman (I) li1I li2I li3I li4I li5I List item (li) Continuing paragraph (p) li1p li2p li3p li4p li5p Blockquote (bq) bq1 bq2 bq3 bq4 bq5 Definition List Term (dt) dt1 dt2 dt3 dt4 dt5 Definition List Description (dd) dd1 dd2 dd3 dd4 Dd5 Pre formatted (pre) pre1 pre2 pre3 pre4 Pre5 Metadata: title (title) Title
**Table of style names for paragraph styles in ICE.** > The set of style names is designed to be different to those that ship > by default with major word processors in order to emphasize that this > is a self-contained system. For example, a first level heading is > called h1, rather than Heading 1 in Word or OpenOffice.org while a > first level bulleted list item would be li1b for “list item, level 1, > bullet”. > > In the default style-sets that come with other word processors this > kind of list item might be “List 1” in OpenOffice.org, or “List Bullet > 1” in Word. The Word style name is more readable than the ICE style, > but at the cost of being so long that it can be difficult to work with > in Word itself, when trying to view style names in the left margin (a > feature denied to users of OpenOffice.org). So, what if Word 2007 finally shipped with the Normal template containing a complete set of styles, like the ICE styles, that would cover pretty much the same territory as HTML? Not just headings, but different flavours of numbered list, definition lists, pre-formatted text and blockquotes in a number of levels that could be combined. Something a bit better than the feeble, incomplete set of styles Microsoft has been shipping for years. Hey Joe, you can [contact me](mailto:pt@ptsefton.com) if you'd like some help – I've been **working on this issue for ten years**. (And what if the much hyped new clean Word interface defaulted to using styles for its formatting? Imagine if pressing those little list-icons have you not only list-like formatting but style-driven list-based formatting. That would mean that you could **export clean HTML** and **really interoperate with other packages**.) Given a decent set of styles then finally the default `Save as HTML...` in Word could produce nice clean HTML. Please, please, Microsoft don't tell us that you've continued to bury and de-value styles, and make templates even harder to find in the interface. For example, it Word's HTML export system saw a paragraph with the style `List Bullet 1` followed by `List Bullet 2`, it would know how to output nested list in HTML. At the moment **HTML export in any word processor is severely handicapped** by having to divine good mappings to HTML from a completely open-ended formatting palette, with the result that clean export is pretty much impossible. You can read about my frustrations with the OpenOffice.org Writer application [here](http://ptsefton.com/blog/2005/10/31/why_do_i_keep_going_on_about_html_export_from_word_processors%3F). And going a bit further wouldn't it be great if OpenOffice.org and Microsoft Word and Google's Writely (see my [post](http://ptsefton.com/blog/2006/03/21/writely,__meet_the_ice_template)) all understood the same set of styles and could make clean HTML from them? (They all agree on “Heading 1, Heading 2” but that's as far as it goes). Ok, so maybe Microsoft and Sun and Google don't care. But [**we**](http://ice.usq.edu.au/) do so we'll continue in our struggle to provide good word processor interoperability even if we have code it ourselves. It would just be so much easier if the vendors helped the community.