ptsefton

2008-09-30

ICE: eResearch for Word users

Filed under: Uncategorized — ptsefton @ 4:26 pm
View as PDF

    I’m just blogging this poster from OR08 to show that it can be done.

    About this hyperposter

    This poster is a hyperdocument designed to show some potential applications for eResearch publications.

    This document has embedded semantics.

    For example, it was written in:

    Embedded geographical data (via geohash) can be used to generate a map like the one here. On the web, this is an interactive, automatic process.

    graphics1

    OpenStreetMap data can be used freely under the terms of the Creative Commons Attribution-ShareAlike 2.0 license.

    The mythical datument

    The term Datument was coined in 2004 by Peter Murray-Rust and Henry Rzepa:

    A datument is a hyperdocument for transmitting “complete” information including content and behaviour. … where the machine is supplied with tools which are semantically aware of the document content. Examples of the latter are domain-specific XML components such as maps (GML), graphics (SVG) and molecules (Chemical Markup Language, CML)

    Murray-Rust, P. & Rzepa, H.S., 2004. The Next Big Thing: From Hypermedia to Datuments. Journal of Digital Information, 5(1), p.248. Available at: http://jodi.tamu.edu/Articles/v05/i01/Murray-Rust/?printable=1

    But they are far from common. This poster / blog post / presentation / map-mashup might be the closest you have ever been to one.

    It’s only 2008 be patient!

    Object3

    Produce PDF, HTML and more from word processors

    1. Microsoft Word (Windows & Mac)

    2. OpenOffice.org Writer & derivatives (Windows, Mac and Linux)

    3. Applies styles behind the scenes to capture structure

    4. Command line or web service for integration

    5. Open source built on Python + OpenOffice.org

    6. Works with Zotero

    7. Built in version control via Subversion

    8. Integrated with ePrints and other repositories (coming soon via the ICE-TheOREM project)

    Object2

    ICE: a hub for collaborative authoring

    Object1

    Ask me how

    (Metadata is embedded in the hyperposter using styles)

    Peter Sefton

    {p-meta-author-name}

    The University of Southern Queensland

    {p-meta-author-affiliation}

    peter.sefton@usq.edu.au

    {p-meta-author-email}

    +61 (0) 410 326955

    {p-meta-author-phone-mobile}

    Also available in machine readable form:

    • Dublin Core

      <oai_dc:dc>
       <dc:title>ICE: eResearch for Word users</dc:title>
       <dc:creator>Peter Sefton</dc:creator>
      </oai_dc:dc>
    • RDF ORE resource map for migration to repositories

    2008-09-19

    Is this thing working?

    Filed under: Uncategorized — ptsefton @ 12:54 pm

    I’m working on my hyperposter for eResearch Australasia 2008. This is a test to see if the mapping system here is still working.

    This document has embedded semantics It was written in:

    1. Toowoomba at USQ [Update: fixed spelling] (S 27.601335° E 151.930854°),

    2. for a conference in Melbourne (S 37.849925° E 144.978368°)

    2008-09-09

    Embedding XML in word processing documents (if you really must)

    Filed under: Uncategorized — ptsefton @ 1:38 pm

    Rick Jelliffe has posted a comparison of how foreign XML can be embed in OOXML (that’s the XML format for Microsoft Office) and ODF (the Open Document Format).

    Rick starts with:

    First the caveat: Word and OpenOffice are not general-purpose XML editors.

    Right. That means that if you do decide that there’s a case for embedding extra XML in OOXML or ODF then you are going to have to supply add-ons to the applications in question to edit it. So what does this mean for the two formats? (As usual I’ll just talk about the word processing format here and ignore spreadsheets and the rest.)

    For OOXML, you would have to create a Word Addin such as the one I’ve looked at here before. There could be business case for that, but you’d have to accept that your documents were only going to be editable in Word 2007+. I gather from recent posts that Rick does some work on projects where this does make good business sense.

    For OpenOffice.org you’re out of luck. Rick’s tests show that OpenOffice.org strips out foreign markup. It’s unclear whether this is conformant behaviour or not:

    But the bottom line for foreign elements as wrappers in ODF and OOXML is that ODF allows them to be stripped out while OOXML doesn’t allow that; neither of course require that any particular application understands them. The bottom line for OpenOffice and Office seems to be that OpenOffice strips them (dangerously, but perhaps allowed because of bad drafting of that part of the ODF standard) while Office 2007 does allow them.

    As I’ve covered here many times ODF interoperability between applications is basically non-existent except between Microsoft Office and OpenOffice.org and its derivatives where some things work quite well. Bottom line is, ODF doesn’t have any formal notion of what’s conformant it’s up to application developers to implement the bits they feel like implementing.

    The OpenDocument specification does not specify which elements and attributes conforming application must, should, or may support. The intention behind this is to ensure that the OpenDocument specification can be used by as many implementations as possible, even if these applications do not support some or many of the elements and attributes defined in this specification. Viewer applications for instance may not support all editing relates elements and attributes (like change tracking), other application may support only the content related elements and attributes, but none of the style related ones.

    http://www.oasis-open.org/committees/download.php/12572/OpenDocument-v1.1-os.pdf

    I think for most uses a much better bet is to use microformats which leverage the built in features of the formats. These not only work in the aforementioned major applications for OOXML or ODF, in many cases they interchange between the formats quite nicely as well.

    What’s a word processing microformat? One example would be using a one-cell borderless table with a paragraph in it of style ‘h-warning’ to indicate a bit of content that’s a warning, to use Rick’s example. Ok, so using a table is inelegant, but it works in both Word and OpenOffice.org writer and will survive round tripping between .doc and .odt and .docx. You could use a frame, which is a more semantically neutral element and sacrifice some interop, or you could use styles only, which is a bit harder for users to manage and more error prone. Actually, Rick gives an example of a styles-based microformat approach.

    We use this kind of technique to do things like generate slide shows from text embedded in documents, and we’re developing methods for embedding metadata in documents using styles.

    2008-09-05

    More ideas about online and offline word processor integration - is anybody listening?

    Filed under: Uncategorized — ptsefton @ 11:47 am

    Via Glyn Moody who doesn’t want to say he told us so I see that Adobe is discontinuing support for Flashpaper, a proprietary Adobe (via Macromedia) technology for disseminating documents online. This means that anyone who has put stuff in there now has to migrate all their stuff to some other format. That’s what you get for using technology that’s controlled by a single vendor.

    That reminded me that I had this piece I’ve been working on about Adobe Buzzword, another Adobe proprietary document format.

    Following my last post on Buzzword, I had an email from Tad Staley at Adobe which seemed encouraging:

    You had an interesting point about exporting named styles to Word. By this, I assume you mean that we create a handful of styles that correspond to Buzzword fonts and paragraph settings, and use them within the .doc or .docx file we create on exporting? This would then allow us to “round trip” the document better from Buzzword to Word and back again.

    We’d like to hear any other thoughts you have with respect to styles - as I said, we’re working on them now, so the timing is good.

    So I drafted something along the lines of what you’re about to read and sent it off to Tad. It was pretty clear from Tad’s reply that Adobe are not thinking along the same lines as me at all . They don’t see it as important to be able to interchange with other word processors because they’re going to make theirs broadly available and they don’t care much for HTML because they care too much about controlling the fine details of document presentation. What this means it they’d like you to use Buzzword / Flash / PDF to disseminate your work rather than an open web format OK some PDF is a bit open but it is very much page oriented and much harder to integrate with other services than HTML. I think that’s terribly short sighted and reduces considerably what people can do with their documents. Mashups and so on that are built for the open web would all have to be redone for the Buzzword world for example my geocoding example.

    I’d be cautious of Buzz word out if I were you, because this format could easily go the way of Flashpaper.

    Anyway here’s the gist of what I sent to Tad at Adobe.

    I think it would be a good idea for you to map Buzzword docs to a set of styles when you export to .doc or .docx- I’d like to see .odt as well. I’ll go into specifics below but first some general comments.

    There are so many issues with styles in word processors regarding styles, interop and HTML export it’s a bit hard to summarize in a short email or blog post, but here are some of the main problems. It would be great for someone to get it right for once:

    1. No standard set of styles: Nobody ships a rational set of styles by default - I’d be looking for something that covers headings (both numbered and not in the same document) lists, block-quotes, pre-format text; the list is actually very similar to the set of elements in HTML, which is no coincidence as that’s a generic schema.

    2. Awful HTML export: Word processors almost always try to reproduce whatever the user inputs in the way of formatting resulting in all sorts of crap in the HTML they output. Building a new product is a great opportunity to do it differently.

      (Buzzword’s HTML isn’t bad by comparison with some, within its current limitations. But really, you should fix the list nesting. I think the HTML model is silly too, but it is what it is.)

    3. No ’structure-only’ mode: Why not have UI mode where you can’t do gratuitous formatting, only structure your document using headings, do lists and blockquotes etc and then choose from a menu of stylesheets? That is, turn off the font panel. This may have been hard to sell in the old days but now I think people would get it particularly when they are writing for multiple media where the same document could be published as both HTML and PDF. If you restrict users to a known style set then you can reliably change the presentation of their documents automatically. If not then you have problems. A couple of examples:

      1. If people have chosen colours then you can’t change the background colour of a page in case you have readability problems.

      2. If you allow absolute indents (say 4cm) then you might not be able to reformat into multiple columns and still have the document look OK.

    4. Extreme confusion in the area of lists: Both Word (and by extension OOXML), and Writer (and by extension ODF) have mind-blowingly crazy list models.

      • Word has paragraph styles to which you can attach list formatting, and it has named outlines (with one of the worst GUIs ever even before Word 2007 took it to new heights) AND it has list styles which came along circa Word 2003.

      • Writer has paragraph styles and list styles both of which can be applied independently, and lists are represented as a hierarchy in the file format, although he GUI gives almost no clues as to what the hierarchy actually is.

      In the ICE project we deal with this by automatically creating paragraph styles and list styles / named outlines and providing toolbars to apply both at once, resulting in much more stable, interoperable documents than you get if you leave users to deal with all this by themselves.

    So here’s what I would do if I had a chance to influence Buzzword, in addition to building in a standard kind of word processor style system.

    Based on my observation of the behavior of the list formating Buzzword obviously has some notion of structure built into it even if it doesn’t (yet) have headings. So lets look at what you could do with lists.

    I think the Buzzword UI for lists is pretty cool one thing I like is that lists stay connected. In most online editors if you change an item in the middle of a list into a plain paragraph and then back into a list item you get two disconnected lists, something that makes no structural or practical sense. Buzzword gets this right and makes sure that list items adjacent to each other are part of the same list.

    Here’s a test-list in Buzzword:

    graphics1

    The UI is really slick it actually understands the structure of the list so when you hit the promote (<-) and demote (->) buttons it does The Right Thing. My only quibble is the way it insists on all the items at a particular level being the same kind of list item even if they are not siblings.

    Oh, and don’t call the list level ‘outline level’ because in other word processors that term is used for the heading structure.

    My proposal is that on export Buzzword should not just use formating it should create styles. As I mentioned before this is more complex than it needs to be, due to the legacy of gratuitous features in the target applications but it is doable.

    Lets take the example of .odt export for use in the OpenOffice.org family of word processors. I’ll use the ICE version of the style names, chosen for their brevity but you could use longer versions.

    Here’s the same test list embedded in this document which I’m writing using NeoOffice. The paragraph style names are shown in curly-braces at the end of each paragraph, behind the scenes my toolbar also applies a list style of the same name.

    Buzzword test document. {p}

    When exporting, you could embed some macros that provide a buzzword-like interface via a toolbar. In ICE we have a toolbar which tries to Do The Right Thing (doesn’t always succeed I have to admit, but we’re getting there). We take a different approach from Buzzword’s modal interface and re-use the same buttons in different contexts. So the promote button in a list will move your list item to the left and it should pick up the right list style by looking back through the document to see what is appropriate - whereas for a heading it would change the heading level in the document outline.

    Why do we do this? It’s all about interoperability. The styles mean that we can produce good HTML, and also move documents between Word and Writer pretty easily, correcting for the differences between their wacky, annoying, productivity-sapping list models. And we give users on both word processor the same toolbar running the same code.

    One advantage for Adobe and their buzzword product would be the same good interoperability with offline word processors. But there’s another potential benefit, the same one I suggested to Google. Adobe could start ‘infecting’ documents with a benign structure virus. Lets see how this could work:

    1. I draft a blog post like this one in Buzzword and send it via Buzzword’s sharing feature to a colleague to add their contribution. I was going to say ‘a paper’ but Buzzword is a long way from ready for that.

    2. My colleague doesn’t want to sign up to yet another online service, and besides is going to be editing the document later at home, so chooses the option to download it as a Word document and saves it on a USB drive.

    3. Later at home Word prompts to say that the document contains macros and should they be allowed to run? If no, then it’s not the end of the world as we still have a Word document that can be re-imported to Buzzword later. If yes then read on.

    4. On opening the document, it’s got a Buzzword-style or ICE-style toolbar, so my colleague is able to make some changes to the document without realizing that they are dealing with the styles that were added to the document automatically on export.

    5. When the editing is done, they can save the document locally, but since there’s a toolbar installed they can click the ‘Return to sender’ button and it gets automatically uploaded back into my Buzzword account via an inbox.

      Because they used the toolbar all the headings are set properly and the lists are nice and orderly.

      (If you don’t understand why I’m going on about this go over to Google docs and try importing and exporting documents using OpenOffice.org Writer).

    6. Later, if my colleague decides that they did like the Buzzword experience they can click the ‘Install the Buzzword template’ button and have the toolbar show up all the time. If they go further and sign up for an account then they can draft things in Buzzword and have them save automatically into Buzzword.

    You can see how this could spread the Buzzword way of life not by replacing offline word processors but by providing a bridge into the online service. If the online way is better then people will naturally stop using their offline programs.

    A couple of other things that would help drive the service:

    • AtomPub support so you can post to your blog, both from the online service and from your word processor. ICE does this already.

    • Simple web page publishing. At the moment Buzzword does HTML export in a Zip file why can’t it just put the page up online for you?

    • An import feature where when a user uploads an unstyled word processing document Buzzword gives it back with added styleage. (See the ongoing conversation I’m having with MJ Suhonos).

    There would be a couple of ways for the online word processor vendors to approach this. One would be to work with the ICE team. As far as I know there is nobody else out there with our commitment to generic word processing based web and print content management. The first mover would have an advantage and if it worked others would follow. The users would win.

    Another would be to invent a proprietary set of styles and toolbars and go for more of a lockin effect. Might work. Wouldn’t be so great for the users.

    I am reminded writing this that all the recent activity on word processing standards hasn’t changed things much for users. For complex documents, like business documents with embedded fields and so on interoperability between packages both online and offline is still really poor, and interoperability between word processing packages and the web is terrible. It’s not about whether you’re doing OOXML or ODF. It’s about what you’re doing with them.

    2008-09-03

    Put on The Fascinator

    Filed under: Uncategorized — ptsefton @ 9:30 am
    View as PDF

    At the Australian Digital Futures Institute (ADFI, née LFII) we have been working on a software project, funded by our friends at ARROW, to build a lightweight web front-end to the Fedora Commons repository software. It used to go by the name of Sun of Fedora, which was just a temporary off the cuff in-joke kind of a name. (It uses the Apache Solr search engine).

    It now has a new name.

    Choosing a name mainly consisted of the ADFI doing a lot of ‘research’ on Google and Wikipedia and IMing each other lke crazy. The process threatened to consume what remains of the project budget so we cut it short after a couple of hours.

    I suggested Christine after the Siouxsie and the Banshees song about a person with multiple personalities on account of the software is used to show the same repository in many different ways. Most of the ADFI staff turn out to be too young, too inattentive or too sheltered to remember Christine although I’m pretty sure it would have been on Countdown. It would have made for a good tag-line for the software.

    Now she’s in purple, now she’s a turtle.

    Anyway, Bron Chandler suggested Fascinator, amongst many many other names. I liked that one, as it’s a kind of add-on to a hat and is typically smaller than a Fez. It also sounds a bit like ‘facet’ which is nice, as the software uses facets to help you discover stuff in the repository. I think having an ‘F’ is nice too. The Fascinator, powered by Fedora.

    This, from the current Wikipedia page is apt for a bit of open source software:

    They are available pre-made, but are also quite easy and cost effective to self assemble. They are also sold in kit form.

    Turns out The Fascinator is also the name of a ragtime tune by James Scott. I haven’t been able to source an Open Access version you can listen to, but maybe someone out there can knock it out for us using the sheet music. *

    graphics1

    The Fascinator it is.

    It is not an acronym and very importantly it is not in upper case but we await construction of a gratuitous backronym, from the man who brought you ARROW, ARCHER and DART or from the creator of FABULOUS and Absolutely Fabulous.

    We have soft-released the software before but now there is a new, open project site where you can download it, if you’re comfortable with Subversion and installing software on Linux and such. There are instructions for Ubuntu.

    The Fascinator will also be used in a project that Caroline Drury and I are leading to take a snapshot of the contents of Australian university institutional repositories, partly to test the software and partly to give a series of point in time snapshots of what is in them for research purposes. We’d like to look at the range of ways people describe their content and compare the way different repository platforms are used.

    I road tested the name on The Long Suffering Sandra.

    PT: You know that software I’ve been working on called Sun of Fedora?

    SC: No.

    PT: Well anyway, we’re going to call it The Fascinator. Is that a good name?

    SC: Only if it’s a project to do with hats.

    PT: Well it is, it builds on Fedora, which is a kind of repository.

    SC: In that case it’s a stupid name, you don’t put a fascinator on a fedora.

    Oh yes we do. Here’s the demo site. And besides, here’s a thing which is both a fascinator AND a Fedora. Unfortunately it’s already sold. (I hope Glamour Bomb doesn’t mind me borrowing this image).

    graphics2


    * I wonder if anyone in the ADFI happens to be a piano teacher in her spare time? (There are a couple of tracks on eMusic in case you’re interested (no, I’m not an eMusic affiliate cos the form was too scary)).

    Powered by WordPress