More on Buzzword

July 24th, 2008

Two people have recently reminded me about Adobe’s online word processor, Buzzword. Coincidence? Groundswell of popularity? Probably not as they are married to each other.

Anyway, it has improved a bit since I first looked at it. At least it has HTML export now (it handles lists wrongly, nesting lists inside lists instead of inside list items, but that’s a common mistake). Still no styles or headings and I fear that it is trying to get people to lock up their documents in some kind of proprietary Flash and/or PDF format.

Adobe are asking for feedback so I gave some over at the Acrobat.com blogs.

I think that there’s an opportunity to Adobe to do what I Google should have done with Google Docs (used to be Writely). I suggested this:

What could be done differently over at Writely so they can reliably import documents and get the lists right, and better still, let people start off in Writely online and produce word processing docs to send out to others?

The Writely / Google people could design a well thought out, freely available generic word processing template that works more or less equally well in various different word processing environments (hint - you’ll need some clean-up code to help the poor word processors keep their lists straight).

http://ptsefton.com/blog/2006/03/21/writely,__meet_the_ice_template/

I think Buzzword should not only use styles, it should get a well designed set of generic styles as a basis and the Adobe folks should build templates which are Buzzword compatible the online service that does this first has the best chance of bridging the gap from the offline to the online world.

If I create a document in Buzzword why not make the default export to Word use some Adobe-defined styles and give the user a buzzword-like toolbar to play with them, post the doc back to Buzzword etc? In all the online word processors I have tried import and export is appalling and I’m sure this must slow adoption.

At the moment all the online word processors are far behind on features that are needed for some documents, you couldn’t write a thesis in Buzzword (not if you wanted tables of contents and figures and numbering and reference management) but you could draft some stuff in there or collaborate on papers then export into Word, or FrameMaker or something to finish the job. Here a well thought out style set would really help with interop.

Adobe if you want any advice on word processing templates drop me a line. (Someone from Google did, but the conversation didn’t go anywhere). The ICE project has some templates you might like to look at.

Some architectural changes to ICE

July 15th, 2008
View as PDF

This post is a look at some architectural changes we’re looking at for the ICE system, as we hit the limits of what we could squeeze out of the old architecture.

Ron Ward has just finished a major rewrite of lots of the application, designed to make it work on a central web server with multiple users, in addition to the ‘classic’ mode where everyone has their own ICE server running on their own computer. He’s spent the last few months trying to get Subversion to do things it was clearly never meant to do.

ICE uses Subversion as a back-end version controlled data store. In the ICE classic mode multiple users work with checked-out working copies of a repository and hit ‘Sync’ to send their changes back to the server and get updates. Behind the Sync button is a fiendishly complicated bit of code that gets updates from the server, detects conflicts, tries to resolve them as gracefully as possible and provide a usable web GUI for the authors.

Object1Figure 1: ICE Classic mode: each user has their own ICE application which looks after their working copy, ICE uses the Subversion protocol to synchronize everyone’s work

Ron’s big rewrite has lots of unit tests based on all the trouble we’ve come across (mis)using Subversion for the last couple of years so we’re happy that it will be robust when running in classic mode.

But the new server version is a problem. If you have multiple users trying to access the same working copy all at once, then Subversion gets in the way it starts locking files all over the place for example. One simple solution is just to put out a server version that doesn’t allow distributed editing like ICE classic does, but our courseware authors really need the ability to manage large volumes of stuff on their own PCs as some courses are pretty big, with a lot of digital assets, while we want to have web access for reviewers and casual contributors to the same courses via a central web service.

So we’re looking at a new server mode where ICE still has a working copy but it knows that it is the only user-agent who has it checked out so it doesn’t need to do updates, it can just do commits. If all you want is a web based content management system then this will be all you need to install and it should run pretty well.

If you are following this technobabble then you’ll be asking but how does that help the ICE classic users work when there’s an ICE server? That would mean that changes made on an ICE client would never make it to the server!

Object2Figure 2: ICE Server mode: No subversion updates required as it is the only user-agent committing changes to the working copy

That’s the tricky part we need to create a new mode of operation for ICE where people want the benefits of the server version AND the classic distributed mode of working. In this mode the ICE application will work in a new ‘client’ mode. It will only ever get updates from the central repository. Any additions or changes won’t be fed back to subversion directly the ICE client will post them just like any other user into the ICE server.

This will require some more coding, but probably not as much as it would have taken to get the ICE server working any other way and it opens up the possibility that we can replace Subversion and use a simpler version control system, possibly of our own devising in future. So a future model might have the ICE server acting not only as interface for humans but for other ICE systems.

Object3Figure 3: ICE Client mode: Users can update their local repository but all changes go via the ICE server. We will automate this so it is seamless for users.

Having made this architectural decision we can press on with testing the ICE server straight away, even without making any changes to the client version. Here’s the plan which we will roll through over then few weeks:

  1. For the repositories which currently allow both server and classic access we turn off the ability for users to commit using ICE classic. If people want to check out their own copy of the content they can, as long as they post their changes back in through the server version manually.

  2. We modify the ICE server so it now assumes that it has THE working copy and only commits changes never updates this will mean we can support multiple users with no dramas (that’s the plan anyway).

  3. We will make a new client mode for ICE which automate the process of detecting changes and posting them from the client version of ICE through the ‘front door’ of the server version pretty much like any other user. Updates will happen as they do now, from the subversion repository.

Tim McCallum shows off Sun of Fedora

June 27th, 2008

Here in the Repository Services group at USQ we have been working on a project funded by ARROW and in partnership with the National Library of Australia. It’s a bit of repository software originally designed to explore the Apache Solr search application.

We looked at Solr last year at USQ, and I blogged about it as part of a consulting job to compare VTLS Vital, Fez and Muradora. Since then, Muradora and Fez have both started using Solr, there is a plugin for Fedora’s standard text search package to use Solr. As far as I know VTLS have not announced anything to do with Solr apart from their Visualizer product.

The goal of the current project is to create a simple interface to Fedora that uses a single technology that’s Solr to handle all browsing, searching and security. This contrasts with solutions that use RDF for browsing by ‘collection’, XACML for security and a text indexer for fulltext search, and in some cases relational database tables as well. We want to see if taking out some of these layers makes for a fast application which is easy to configure. So far so good.

This is not a replacement for VTLS Vital, and is not intended to replace the NLA’s ARROW Discovery service which is also based on Solr.

We now have a working demonstration with content pulled from a number of repositories, and are able to show the main things we set out to achieve. Administrators can set up a new portal which shows a subset of the main index with a few clicks, and we have a security model which can restrict access to metadata and data based on group roles.

I will post some more information about the emerging architecture of the application soon, but for now Tim McCallum has put together a demo screencast, which had him slaving over a hot video editor over the weekend (forgive any glitches, it’s his first time). Or you can try it out for yourself (Demo URL may not work after October 2008). If you want to log in contact me for a password.

Thanks to Oliver Lucido who did most of the development, building on work he did for the FRED project last year with David Levy. Tim has also been assisting, with project coordination from Bron Chandler and stake-holding from Neil Dickson at ARROW and Alison Dellit at the NLA.

A few words on magic

June 26th, 2008

MJ Suhonos from PKP has patiently explained where I got some things wrong about Lemon8XML in my previous hasty post.

I’d like to pick up one theme from MJ’s post. MJ says (with emphasis by me):

The larger problem, of course, is that L8X is encumbered, in a way, by the common expectation that it should just “magically” work on whatever format the author or user is providing — it is an application that is designed to solve, in part, an infinitely-unsolvable problem. So, the user has to meet the application halfway.

I agree that this expectation that tools should perform magic is a problem. We see this in the HTML export from word processors; they take arbitrary input and turn it into HTML. In the inevitable absence of magic you typically get sub-standard output.

I understand the requirement to try to understand the structure of ad hoc documents if you can, but I don’t think it’s a good idea to encourage people to keep creating them; if L8X has a version of meet me half way which involves direct formatting instead of styles then that will be a step backwards in my opinion. My version of meet me half way would be at least to try to get people to use headings. If they don’t then the structure guesser will step in, try to guess and give them their document back to correct when the inevitable errors occur.

I took a look at the single sample document for L8X on the demo site. It’s clear that the structure-guesser part of the application is going to have to be very clever to work well. It seems, for example, that the goal is to detect captions either before or after a graphic or table even when they have no special formatting. Introducing edge cases like short paragraphs both before and after an image seem to cause it problems, including loss of text but I could be wrong, again.

(I’ve had a look at the document parser code and it is taking into account paragraph length, and doing some reasoning based on text-size and formatting attributes).

So, even though I had some of the architecture wrong, I still think that Lemon8 XML would be vastly more useful if it had a two part architecture:

  1. Styled word processing document to XML conversion, with the obvious caveat that if you’re turing a generic format into a domain specific one you’re going to be producing stuff that doesn’t use the whole of the target format and may have gaps that need to be filled in.

    Lemon8 XML has its own XML format, but I’m wondering if it couldn’t just use ODF which is a well specified standard, with the ability to give the document back to the user. (Checking with MJ via email about this).

    The goal would be to get as many people using this mode as possible because it is the least work for everyone no guessing strucutre required if people can use markup.

  2. Ad hoc-formatting to styled word processing conversion using the best available heuristics to guess structure and give the document back to the author in an improved form. As far as I can tell that’s not a goal for the PKP team, but the code is out there so we could do it, using their algorithm. We’re looking into it.

It is important to help our colleagues who are authoring documents in word processors to use styles. It’s good for them. It will improve their working lives. And it will open the door for them to start dealing with real eResearch and the semantic web. A project like the TheOREM-ICE would be impossible with documents like the L8X sample document.

Lemon8 XML beta released

June 23rd, 2008

The PKP people have released a beta of Lemon8-XML, (L8X) their journal-oriented word processor-driven XML publishing system.

I tried out the demo server with an ICE test document.

The bad news is that the service had significant problems with my document; It could not locate author metadata, incorrectly identified some ordinary text as being citations, and lost most of the document text, which is obviously a very major issue.

The good news is that MJ Suhonos from PKP was onto me straight away with an email and is keen to work on support for styles in general and ICE styles in particular. (It’s in the FAQ that we will collaborate on this).

If the PKP team can get a decent structure guessing application to work on arbitrary input that would be great, but even better would be to close the loop and give back documents with more structure than you put in. At the ICE project we will help however we can.

If it was me doing this I would break this problem into two parts:

  1. Build a converter that can take structured word processing documents and map them to the NLM XML format used by L8X. ICE offers one well worked out structure for generic documents, others may exist for specific formats.

  2. Build a structure-guessing application to add structure to word processing documents (something which Ian Barnes has been chipping away at for a while).

With both of these in place you can improve documents in the wild as you go; every time someone submits a draft add styles and give it back to them, rather than trying to guess structure at the end. I would like to see this embedded in the OJS journal management system from PKP so that authors get rapid and continual feedback every time they upload a draft. This would allow some editorial and review processes to take place in an HTML interface as well rather than via PDF on word processing files.

If you leave L8X as the final step, authors will have little feedback as to how they can improve the structure of their drafts.

My two-part plan would re-ordering sections in L8X become redundant word processors have outlining tools with which you can reorder content, so why try to do it through an HTML interface?

On a technical note, last time I looked at L8X I concluded that Docvert is a weak link it tries to to use XSLT to guess structure; our experience with ICE was that XSLT (version one at least) was not a productive way to do this as the austere functional programming environment in XSLT made the structure-reasoning code very hard to maintain and very slow, so we moved to more traditional parser written in Python which is much easier for typical programmers to work with.

An ICE like ODF based web publishing system

June 20th, 2008

From Kay Ramme at the GullFOSS blog at Sun comes this demo of a wiki-like system using ODF as a document format and OpenOffice.org as an editor.

It seems to be using WebDAV to allow users to edit documents on a server, then convert them to HTML automatically when they load the document in a browser.

Good idea to have the user change a document and automatically render it to HTML on request.

Same idea, in fact as the ICE system.

Some differences with ICE:

  • ICE doesn’t use WebDAV because, well, it doesn’t work with Windows reliably and it doesn’t work with the Mac too well either.

  • ICE doesn’t rely on OpenOffice’s native save as HTML feature which will produce awful results on all but the simplest text documents. A few of several reasons not to use it:

    • It gets list formatting badly wrong.

    • It exports photos at full resolution and puts height and width attributes on them to resize them meaning that you end up shipping megabytes when you should be shipping kilobytes.

    • It is not styles-based so you have no way of configuring it to do things like use pre formatted text in the right places.

  • ICE is styles-driven which means it produces very clean HTML compared the rubbish that office suites spit out.

  • ICE uses templates to help people apply styles.

  • ICE can deal with Microsoft Word documents and has cleanup code to correct some of the interop issues with OpenOffice.org.

  • ICE has a version-controlled back end courtesy of Subversion so it can be used by distributed teams.

  • ICE can create IMS content packages for courseware.

  • ICE has an Atom Publishing Protocol button which can send stuff to a blog and do a much better job of formatting than the Sun Weblog Publisher addin too.

  • ICE has a plugin architecture and a growing number of hooks for integrating other content types like chemistry data.

  • ICE doesn’t deal with spreadsheets, but we could add that pretty easily.

  • ICE doesn’t have a mechanism to create new pages by linking to a target that doesn’t exist if we add that we’ll make it a bit smoother than what’s shown in the demo.

  • ICE can be used as a conversion service by other systems.

I could go on.

If you like the demo, check out some of ours although I note that we don’t have a really basic one that shows what Kay shows in hers. We’ll get on to that.

Adventures in Geocoding part 2: Embedding data points in documents

June 19th, 2008
[update: the map doesn’t seem to work well in IE - works well for me in Firefox.] View as PDF

I have been thinking about how to start integrating more semantics into ICE documents. This is only a preliminary look, but it’s very promising so far.

A wrote a while ago about embedding metadata in pictures. This time I look at how one might embed geographical data in a document.

I was tempted to do a dog-poo map of East Toowoomba showing how my hounds like to defecate as far as possible from a rubbish bin so I get to carry the re-used plastic bag further, but I’ll spare you that and show you some thing else.

Take this cycle hazard for example. I have linked the picture it to a web album where you can see it in context. The caption has the location in the text so if you download the PDF you can find the location for yourself.

graphics1

One of many hazardous grates on Ruthven Street Toowoomba (-27.590334, 151.948166)

Or I could point out another dangerous place where the cycle lane disappears (-27.595667, 151.947174).

If everything is working correctly, you should see a map somewhere in this post (I’m still wavering about where to put it) showing those two points; and if you click on the little pins you will get the description for that point. I doubt it will work in places like Google Reader so click through to the post.

What I’ve developed here is actually not ICE specific. All I have done is adapt little bit of Javascript of Simon Willison’s to go through a page and look for HTML elements marked with the class attribute ‘geo’. It’s pretty dumb at the moment, and it relies on a convention that each location has an optional description followed by the coordinates in brackets. Only handles decimals, not degrees and minutes and would spit the dummy if you said 27.6045° S instead of -27.6045.

To use this in ICE I have to set up some javascript stuff to load everything in, install it in the blog server and so on, which took me a stupid amount of time, but in the documents themselves it couldn’t be easier. I defined a new style called i-geo (i is for inline) and ICE automatically converts that to HTML spans with class=geo when I generate the HTML.

By coincidence, there was a post this week from Roderic Page on mining PDFs for geographical data. Great stuff. It’s very like the work that Peter-Murray Rust’s group and others do with mining chemistry data from PDFs. But there are problems:

The service uses a bunch of regular expressions to try and extract latitude and longitude pairs from the text (needless to say, there are nearly as many different ways to write a latitude and longitude as there are authors).

http://iphylo.blogspot.com/2008/06/from-pdfs-to-google-earth.html

What we want to do in ICE is provide authors with easy to use tools so they can unambiguously encode data and validate it before they hit the ‘publish’ button. One way we plan to do this is to adapt an application like Roderic’s tool. In this case I’d point it at my document and it could tag all the coordinates I’ve got and normalize them to my preferred method of expressing coordinates, then mark them up in some way. Ultimately this will be more robust than these fragile after-the-fact scraping services. My document will be able to advertise its own meaningful content not cling to it jealously until it is exhumed by an application later on which has to pry it out of its cold dead PDF-fingers.

We’re going to do something like this with the TheOREM project, too. It’s in the work plan to run the OSCAR chemistry-sniffer-outer application over documents and get it to mark all the bits of chemistry, as well as give its automatic sanity check; once that’s done we can start pushing out chemistry with built-in semantics.

Now, I bet if Bruce D’Arcus is reading this he’d be saying ‘Use the new metadata support in ODF 1.2‘ and I will investigate that. But, given the user base I deal with an OpenOffice.org only solution is not optimal we also need a solution that will work for groups who use other tools such as Microsoft Word. The Style based microformat approach is one such interoperable mechanism. Styles work, but are bit tricky to apply. I like simple links even better.

In the geo-world geohash looks pretty cool. Geohash is an algorithm which can turn this:

graphics2

Into this short URL: http://geohash.org/r7h51ehscv0g

I have set up my little Javascript experiment so that if I add a link it will automatically push a pin into my map. The Geohash algorithm is open, so I don’t need the Geohash service to use it. I found an open source library easily enough, and the URL makes a perfectly good identifier in my opinion. Yeah yeah it’s got the http protocol on the front but it’s a unique string for a point on the earth and more importantly I can use it in pretty much any modern editor.

So I could use a simple link to point out a place where the road is in terrible condition and have that point show up on my map. If you grab the PDF version of this page, you’ll see that the links are all footnoted automatically so I don’t have to type in coordinates or mess with styles. I just link. ICE has a nice feature where it can footnote all the links in my documents for the PDF version, too so the information is there in a usable form in print . If we wanted to get really fancy it could decode the Geohash into a human readable format for the print view.

What I’m thinking about now is a framework for semantic markup in word processors and beyond that takes into account all the prior art (smart-tags in Word for example) and the practical realities of mixed-application workgroups and a Microsoft-heavy world. I might try to put something together for the forthcoming e-Research in the Arts, Humanities and Cultural Heritage workshop. We have a part-written paper about embedding metadata in documents lying around that may serve as a base.

More on negative click or net benefit repositories

June 16th, 2008

So the conversation that Chris Rusbridge started about low-effort repositories rolls on. Chris summarizes some of the responses. Including mine and broadens the discussion to bring in some of the stuff that Andy Powell has been saying:

Andy wants repositories to be more consistent with the web architecture. He spoke at a Talis workshop recently; his slides are here (on Slideshare, one of his models for a repository).

This reminded me that earlier this year people in my network were talking about Andy’s keynote at VALA. We responded to the ripples running through the Oz-repos community by putting a project proposal to ARROW in Australia to start working on a repository ingest application that is much more ‘of the web’ than those we have now.

The ARROW board didn’t approve that one, I’m sure it wasn’t the just the name that was wrong but I gather that was not popular. And it was a truly stupid name.

I had to think of something quickly so I called it VICE-SQUAD in the spirit of highly contrived acronyms that seems to pervade the ARROW community.

VICE SQUAD means (VITAL-compatible Integrated Content Environment-driven Service-oriented Queryable User-friendly Application for Data-acquisition)

Here’s a bit of the proposal we put, which seems to be along the lines of what Chris from the Logical Operator blog suggested in response to Chris R. From our proposal:

The goal of this project is to build a smart user-friendly repository ingest system for VITAL and/or other Fedora based repositories, which will be implemented in the Integrated Content Environment (ICE) service framework. The system will be released as open source software. The application will be a stand-alone ingest system with back-end coupling to ICE.

The project will attempt to create an innovative interface for repository ingest which is quite different from other approaches, allowing users to upload content into a working repository, or workbench from where it can be shared with the world (sharing with defined groups is out of scope for this project but will be dealt with in a separate USQ project) and/or submitted for ingest into the repository; ie pushed over the curation boundary1.

It will consist of three interfaces:

  1. A dead-simple user interface for academics to share their work as quickly as possible and tag it with free-form metadata. They will upload items to a workbench where they will be able to work on them further, or merely mark them for ingest into the repository.

    (see this blog post from the JISC repositories interest group for some thinking along the same lines, with pointers to a commercial service called box.net which could serve as a model for the sharing-features proposed here if adapted to an academic context.)

  2. A graphical user interface for repository staff or advanced users to edit MODS metadata for a record and turn the user’s initial tags into formal metadata, including the ability to edit existing metadata records from VITAL.

  3. A seamless tie-in to a structured authoring environment, so that papers authored in such an environment can be sent to a repository with a single click

In addition to the two interfaces there will be behind-the-scenes ’smarts’ that can extract metadata from documents and produce HTML and PDF automatically, using technologies already developed by USQ.

I think the time has come for someone to build a repository which has the simple ePrints approach to collecting metadata, with an option to make it even simpler and just go with tags if that’s all the energy the depositor can muster.

Our proposal goes on to talk about MODS and MARC and METS but I think maybe the time is right to do RDF, especially if the Bibliographic Ontology makes it into Zotero. And we should look at ORE support rather than bother with METS.

For those who care to add more higher-quality metadata and often this a librarian tidying up later there needs to be just a little bit more smarts than ePrints or DSpace offer with their flat metadata in the area of stuff like research affiliation and researcher identity, stored in RDF, with an option to serialize it in other metadata formats as required.

While we didn’t get that project up with ARROW we will have another opportunity to build on the forthcoming TheOREM-ICE work.

We have a big need for simple sharing in ICE right now, and I imagine that this will be true for thesis writing too wouldn’t it be great to share your PhD draft with reviewers and draft-readers in a simple way?

One thing I’d like to do is to turn on document sharing via an obscure non-guessable URL so that people can drop in and comment on my documents using ICE’s inline annotation systems without authentication. Or for more formal collaboration, I want to be able to create ad hoc workgroups preferably using a single sign on service of some kind. Once we get through some of the nasty issues we’re having with the ICE 2 beta version we will no doubt start adding those collaborative features.

Then when TheOREM kicks off we’ll have an ICE to repository gateway pumping content into DSpace, Fedora and ePrints.

What’s needed as well are some simple services to let people upload stuff and push it out. ICE already lets you push to a blog via ATOM (all the posts here are done that way), but we could add SlideShare and Flickr and suchlike as additional services, as well as a simple web sharing interface that is less controlled than the Institutional Repository. As Peter Murray-Rust says: Dont use Institutional Repositories put it on the web.

Deflation in repository clicks

June 11th, 2008

Chris Rusbridge is contributing to repository click deflation at the Digital Curation Blog with a post about Negative Click Repositories.

Why deflation?

At Open Repositories 20008 a group of us Australian developers entered in the Repositories Challenge, with an entry entitled Zero Click Ingest [1]. The introduction puts it like this:

This micro-project demonstrates a way to eliminate the repository deposit step altogether, by having the repository software take responsibility for collecting the content that it needs. It involves using the Integrated Content Environment1 (Sefton 2006)[2] (ICE) as a document authoring system, but the principle could be applied to other content management systems which support metadata or category-aware ATOM or RSS feeds, with the ability to supply the requisite formats. We show how documents created and managed in ICE can be automatically ingested into a repository at the appropriate time, based on document state.

So we did zero click but now we have to go for negative clicks? That’s not fair!

There are two aspects of this I’d like to comment on.

  1. Chris is calling for other people’s take on the negative click repository, so I’ll contribute my thoughts.

  2. It would be wrong to think that we can only get people to use new software if it means they don’t have to learn anything new. I don’t buy that; people can change when they see the benefits. The trick is to get them to want to change by themselves.

My thoughts on repository workflow

I’ve been thinking about repository workflow ever since I joined the RUBRIC project in 2005. I was outraged that this Institutional Repository business seemed to be all about putting PDFs into ugly web sites. Here were all these custodians of collections of a few thousand PDF files getting together at conferences to talk about how they were doing Web two-dot-oh. Um, no you’re not. Try using a web format first. The program committee at Open Repositories 2007 didn’t accept my paper along those lines but they let me give a poster. That poster is available in HTML, PDF and as a slideshow, all generated from a single source document. Now that’s at least web 1.01.

In 2006, I talked about some of the possibilities for smart content management systems (like ICE) to supply content directly to repositories, in Workflow 2.0.

We still haven’t done most of what I talked about in Workflow 2.0, although the Zero Click Ingest demo was a big step in proving the concept. But we’re about to make a big push in this direction in collaboration with Jim Downing and team at Cambridge as part of the JISC TheOREM project.

In TheOREM we’re going to set up ICE as a ‘Thesis Management System’ where a candidate can work on a thesis which is a true mashup of data and document aka a datument [3]. When it’s done and the candidate is rubber-stamped with a big PhD, the Thesis management system will flag that, and the thesis will flow off to the relevant IR and subject repositories, as a fully-fledged part of the semantic web, thanks the embedded semantics and links to data.

One more idea: consider repository ingest via Zotero or EndNote. In writing this post I just went to the OR08 repository and used Zotero to grab our paper from there so I could cite it here. It would be cool to push it to the USQ ePrints with a single click.

That’s one click but it’s about fifty less than the alternative. Is that what you mean by negative, Chris?

Net-benefit repository workflows

I’m not sure I’m happy with the implications of the term negative click here (even ‘zero clicks’ is a bit suss). As Chris puts it:

So we need to develop repositories that make the data work to take human work away: negative click repositories.

Not that I don’t agree that we want to take the work away, just that I think this will involve change and I think we should be careful not to imply that we can perform magic.

At USQ, the Integrated Content Environment is on the way to becoming a ‘core’ system for producing courseware. It has been available for a few years under an open source license, but as far as I know it’s not used much at all outside of USQ. Why?

I think its because it represents a net loss for most users; they don’t need to do what ICE does and/or they can’t afford the time to set it all up. This is why almost nobody has taken it up spontaneously for general writing tasks or even for courseware outside of our institution.

But at USQ, where we are reaffirming our commitment to flexible delivery we call it Fleximode staff know they have to create resources that suit on-campus, web and print use. ICE helps with that, so it has grown organically from our first user to a couple of hundred because overall it makes life easier. Learning ICE is not trivial, you have to do some training, you have to change the way you work, and the organization needs to supply support. If you do use it, though, there’s a net benefit.

I think it is a mistake to focus on the Negative Click repository as a goal and assume that it’s going to happen without changing user behavior. If people are bashing away in word processors as though they are typewriters then you miss the opportunity to create XHTML reliably, and you throw out heaps of semantics. We’ll be stuck with PDF repositories for a long time until we empower our authors to do better.

If we can seed the change by empowering some researchers to perform much better to more easily write up their research, to begin to embed data and visualizations, to create semantic-web-ready documents then their colleagues will notice and make the change too. That’s two things:

  1. Make it easier.

  2. Make the result sexier.

This is one reason why TheOREM is so exciting; not that it’s going to look at ORE but that it will be a first step to providing tools for PhD candidates and their supervisors that I hope will be the envy of others, just as Shirley Reushle used the first version of ICE to make an online course that met the USQ standards and her colleagues saw it and wanted to do the same.

At one point in the development of ICE we were beating off potential users. I reckon we can do the same with eResearch and repositories but it’s hard work and it takes time.

[1] L. Monus et al., Zero Click Ingest, Apr. 2008; http://pubs.or08.ecs.soton.ac.uk/119/.

[2] P. Sefton, The Integrated Content Environment for Research and Scholarship, ICE Website, 2006; http://ice.usq.edu.au/introduction/ice_rs.htm.

[3] P. Murray-Rust and H.S. Rzepa, The Next Big Thing: From Hypermedia to Datuments, Journal of Digital Information, vol. 5, 2004, p. 248; http://jodi.tamu.edu/Articles/v05/i01/Murray-Rust/?printable=1.

Why wasn't I using styles in diagrams?

June 7th, 2008

I have said here many times use styles. So why wasn’t I doing so when I could have been?

Regular readers will know that I’m a bit word processor-obsessed. In the word processor a style is name for a bundling of formatting and structural information. For example, in most word processors if you use Heading 1 style, then that helps to structure your document, and the formatting is taken care of automatically; today 18pt Helvetica, tomorrow 16pt Times. You can generate a Table of Contents automatically, and you stand a better chance of getting good HTML export.

I don’t think so much about diagrams.

This week I tried out Cmap Tools, a concept mapping application that we’ve been working with on the ICE project because some of our courseware authors use it to provide scaffolding for learners to use in constructing knowledge. We are looking at taking this a bit further, under the direction of Professor Jim Taylor, more on that soon.

I tested it out drawing a couple of architecture diagrams and I liked a couple of things about it, one of which was the use of styles.

Here’s a fragment of a concept map. The orange boxes are made orange with a style called Data, and the yellow one is in style SoF, which is the working name of the application we’re building; SunOfFedora. If I change my mind about the colours I can change the style and the diagram will update itself.

graphics1

The other thing I liked about Cmap is that it represents relationships as propositions. It even shows your diagram in proposition form it you want.

graphics2

I decided that while Cmap is interesting, I don’t want to use an application that forces me to store things in a certain directory and which loses my work by deciding that it’s not going to save the map I’ve been working on for an hour or so. And I’d like more flexibility about the shape of the shapes I’m joining up.

I can achieve pretty much the same visual effect using the drawing tools that come with OpenOffice.org; joining objects with connectors, and putting labels on the connectors.

So I started redoing some of this work using Draw in NeoOffice and it struck me that I was going to miss the Cmap approach to styles. But wait, Draw does have styles, it’s just that I have never noticed, and never thought about looking for them. So for a few years I’ve been drawing dodgy diagrams and inefficiently hand-formatting drawing elements to look alike, just the same as many people use their word processor before discovering styles.

Now that I think about it, I must have heard of this feature, but I never put two and two together and used it in my occasional diagramming. One reason I never noticed, I think, is that it’s not available in Microsoft Office (and probably by extension in OOXML) which I was using for many years; Microsoft set my expectations and I continued to expect diagramming in an office suite to be painful.

Use Styles.