ptsefton

2008-12-13

Old Slides

Filed under: Uncategorized — ptsefton @ 2:00 am

Dorothea Salo:

I cant establish that I was the first to decry build it and they will come thinking with regard to institutional repositories.

http://cavlec.yarinareth.net/2008/12/12/two-slides/

Susan Gibbons decried it in 2006 with regard to the Rochester repository. In her slide early misunderstandings:

Early Misunderstandings

  • Faculty see need for repository

  • Faculty see value in repository

  • Self-archiving is a practical expectation

  • Technology is the difficult part

Build it and they will come!

Sounds like a prescient bullet point summary of a cavlec post or two.

I remember Susan’s talk as inspiring. They built it. Users didn’t come. They responded by doing anthropology on the users to find out what they wanted, in order to work out what to build next time, and followed up with projects like the one I mentioned here where I gave some advice along with Jim Downing, whose desk I’m sitting at right now in Cambridge.

(The Rochester work is cited in Roach Motel I believe.)

2008-12-11

Scholarly Publishing using the Integrated Content Environment

Filed under: Uncategorized — ptsefton @ 11:45 pm

2008-12-11

Peter Sefton

University of Southern Queensland

sefton@usq.edu.au

1 Orientation

This is another blog post based on a presentation. What I’ve been doing lately is working on each presentation as a document with embedded slides before I give it. I think that working this way makes for a more coherent delivery and I can take a bit of time to edit the thing into something a bit more useful than a PowerPoint presentation*. Of course it’s only ever half done by the time I give the talk so I have to come back and finish it off and edit to make it reflect what I think I actually said.

Here’s the notes for my talk at the recent Open Access Publishing workshop run by APSR. The website bills it as A two-day Public Knowledge Project Workshop, while the program says it was a a PKP User Group Workshop, either way the idea was to talk about issues around the mission of and the software produced by the PKP project. Website says:

The Public Knowledge Project is a research and development initiative directed toward improving the scholarly and public quality of academic research through the development of innovative online publishing and knowledge-sharing environments. More…

This was the last APSR event and I’d like to take the opportunity to say thanks to the APSR crew, particularly Margaret Henty for all the APSR events of various kinds over the last few years. These events got us started in the repository business, APSR events helped bootstrap the RUBRIC project.

I started with this abstract:

Abstract: The Integrated Content Environment (ICE) is an open source software application and service-platform for word processor based academic authoring, originally built to support the University of Southern Queensland’s flexible-delivery courseware but now expanding into more general scholarly publishing.

We will show a myriad of ways that ICE can be used in the academy. Starting from document creation, ICE uses generic word processing templates which capture the structure and semantics of scholarly documents. The system manages collaborative works in progress using a distributed version-controlled repository, and can publish works to HTML, PDF and domain-specific XML schemas. It has been integrated with several other systems, including the Moodle Learning Management System and DSpace, ePrints and Fedora repositories.

Specific examples will include an journal which uses the ICE publishing system, demonstrations of semantic-web publishing for theses (an outcome of the JISC-TheOREM ICE project) and institutional repository integration.

2 Introduction

The Integrated Content Environment (ICE) is an open source software application and service-platform for word processor based academic authoring, originally built to support the University of Southern Queensland’s (USQ’s) flexible-delivery courseware but now expanding into more general scholarly publishing.

ICE is a product of a software development team now located in the Australian Digital Futures Institute (ADFI) at USQ. The Institute is involved in a research and development projects in eLearning and eResearch.

When I first wrote up this talk, I thought it might come out a bit negative. I had this paragraph:

This presentation I will show-off ICE and talk about how some of its modules can be used in and mashed-up with other systems. Unfortunately, the conclusion to this presentation is not and they all lived happily ever after. After showing some of what ICE can do, a lot of which I believe is very important to the scholarly enterprise, I will go on to talk about the challenges that lie ahead. All we have to do is change the entire scholarly publishing model and along the way make our publishing systems much more flexible and far reaching.

But actually, the tone of the meeting made it a bit hard to summon the doom and gloom. We had a very productive exchange on how we might mesh our technologies and experience with the PKP systems.

3 About ICE

I kicked off with some credits. ICE is a team effort, I make the most noise but I don’t do the most work.

Credits (in random order)

  • Ron Ward

  • Pamela Glossop

  • Oliver Lucido

  • Daniel de Byl

  • Bron Chandler

  • Linda Octalina

ICE is more than one thing. See the website.

ICE components

  • Word processing templates (MS Word and OpenOffice.org)

    Easy to use, generic, extensible structured editing.

  • A web application for converting word processing documents into HTML and PDF, packing and disseminating them.

    (This normally runs on the desktop one copy per user)

  • A set of APIs so that ICE components can be used in other systems.

The templates are at the heart of the ICE system. They provide a simple generic way to structure word processing documents. It is well known, if not a scientifically documented fact, that using a word processor to create web pages typically results in, shall we say sub-optimal web sites.

ICE is style-driven, not as in fashion conscious, but as in styles-as-named-bundles of-formatting which have structural or semantic significance.

ICE is style driven

About the only styles your word processor can export to a web page are Heading 1 and family.

So ICE defines styles for:

  1. Headings with and without outline numbering.

  2. Lists lots of flavours including bullets and definition lists and various was of counting enumerations.

  3. Blockquotes and preformatted text.

To convert word processing documents to HTML and PDF ICE uses OpenOffice.org. ICE converts Word documents to the OpenDocument Format behind the scenes, a step which is not necessary with OOo Writer. And from there there’s code to turn ODF into HTML. ICE uses Writer to generate web-ready images from the various graphic formats, equations etc.

ICE can be used in lots of ways. It started life as a courseware authoring system. At USQ we have a decades-long tradition of producing printed distance education materials, and for the last decade or so of making the same content available on the web, with added webbish goodness.

Starting from document creation, ICE uses generic word processing templates which capture the structure and semantics of scholarly documents. The system manages collaborative works in progress using a distributed version-controlled repository, and can publish works to HTML, PDF and domain-specific XML schemas. It has been integrated with several other systems, including the Moodle Learning Management System and DSpace, ePrints and Fedora repositories.

While ICE can be used to manage thesis drafting it doesn’t really manage the other processes that go on, Particularlty the examination process where PKP’s Open Journal Systems (OJS) might help OJS is used at USQ to manage the thesis review process for some theses.

Speaking of journal systems I showed an example of journal which was published using the ICE publishing system:

A Journal

http://www.usq.edu.au/electpub/e-jist/docs/vol10_no1/default.htm

ICE gave no help with the journal workflow, but it did create HTML and PDF renditions of all articles

There’s very little overlap between ICE and OJS.

ICE wrt OJS

Object2

I was able to demonstrate some work that Linda Octalina has been doing to integrate ICE conversion services into OJS. She’s added a basic feature where you can upload an ICE document and instead of treating is as a monolithic word processing file, OJS fires it off to ICE to be converted into other formats, and then shows them to you:

graphics1

In the above screenshot you can see how OJS now has an HTML and PDF rendition for each file, If we could make this work then OJS would be able to use ICE features like embedding data such as chemistry or inline annotations. Here’s a screenshot that covers a few ICE features:

graphics2

The final point above, about a generic structured document profile seemed to resonate with MJ Suhonos from PKP. We’ve corresponded for a while about getting ICE and his Lemon8-XML project better aligned. Now that we’ve met s and had a couple of glasses of wine that should be a bit easier.

If MJ decides to use ODF we’ll be there to help with the process of getting the Open Document Format word processing format (ODT) plugged into Lemon8-XML as a back-end, a byproduct of which will be what I was hoping for in my previous post, a way to bring unstructured documents into the ICE fold.


* As I always say a used PowerPoint (particularly without speaker notes) is about as much use as a used condom. Fun for those who were there, maybe, but lacking appeal for those who were not.

2008-12-09

More thoughts on thesis embargoes

Filed under: Uncategorized — ptsefton @ 7:48 am

I wrote last time about how we might do thesis embargoes with ICE as part of the TheOREM-ICE project we’re doing with Jim Downing and team at Cambridge. That post was mostly about why we wouldn’t want to add complex access control at a very granular level to ICE.

I’m actually in Cambridge now and I’ve talked the issue over with Jim Downing and Nick Day. We think we’ve come up with a workable, implementable prototype for thesis embargo which I’ll describe here. But first some background about the requirements.

We’ve been whiteboarding thesis workflow with three broad stages;

  1. Dafting, where a small group have access to the emerging document plus its data. The group could be as small as one candidate and one supervisor but others may be allowed in. We don’t need fine grained access control for this bit and embargoes are not relevant.

  2. Examination, where the thesis goes off to examiners. Again you don’t want to embargo anything or what would they be examining? There will be some indication of which bits are going to be embargoed though so examiners can take care not to talk about those bits to others. Access control is crucial, but so is management of the examiners and managing their feedback. It’s not clear that we would want to add these features to ICE, maybe to OJS?

  3. Making public, where we do need to worry about embargoes. At this stage the content should be out of the drafting system; this is what the ARROW people called crossing the curation boundary.

While we were talking about this it came to light that one of the theses that we’re looking at for the TheOREM-ICE project has a rather unusual embargo requirement. The thesis is OK to go on the web apart from the acknowledgements section which is apparently considered by the author to be a private matter. The solution we came up with will deal with that.

Here it is:

In ICE each chapter/part of the document will be a separate document, in Word or OpenOffice.org writer, as per the way ICE does courseware. This is much safer for book-length content than trying to use Word on a single file, anyway.

The candidate will enter embargo information into each chapter using the techniques outlined in our joint paper that Jim Downing recently delivered at the IDCC conference in Edinburgh. That is, at the top of each file will be a place to put some embargo information. We have to work out how this will look but it will be something like:

Embargo period (in months)

  • Leave blank or put zero for no embargo

  • Put -1 for indefinite embargo

6

Embargo manager (optional)

http://ptsefton.com/pt

This is human readable, so it will give cues to supervisors and examiners about how the data are to be treated, but it will also be machine readable so that downstream systems can be used to enforce the embargos.

The downstream system we have in mind to demonstrate this is The Fascinator, which was designed for just this sort of thing.

Once the thesis is awarded, someone in authority will be able to click the ‘Make Public’ button, which will push the content to an instance of The Fascinator using an OAI-ORE resource map to describe the contents of the thesis in excruciating detail including the metadata about embargoes. The Fascinator will index the embargo dates and enforce access controls using its normal approach of limit queries. That is, the guest account will always have a query like this added:

issue_date<=$todays_date and total_embargo=false

This says make sure that we only show things issued on or before today which are not under unconditional embargo. At the moment it seems like the simplest solution for embargo dates will be to use the Dublin Core issue date element. If an item has an issue date in the future then The Fascinator will refuse to serve it to guests.

The openId field in the metadata example above is a place to capture an identity that is controlled by the candidate so they can come back and manage embargoes later, after their local institutional login may have been turned off. We don’t know how to store that metadata just yet.

We may not be able to implement all of this immediately because we’re coming up to the silly season in Australia, but I’m sure that we will be able to demo parts of this scenario fairly quickly.

2008-12-02

Embargoes on bits of theses: skating on thin ICE?

Filed under: Uncategorized — ptsefton @ 6:07 am

I gave myself a task* after the last TheOREM-ICE teleconference to look into how ICE might be used for fine-grained thesis embargoes.

I have not seen a full spec but I gather from the conversation that sometimes you want to make a thesis available but place some of the data, or maybe a chapter or two under embargo. Jim Downing proposed that a repository would still advertise the ORE resource map of the thesis on the web but some parts would be unfetchable by anonymous access until the embargo period is expired. Could we use ICE to do that?

Maybe not such a good idea.

I’ll talk here about why not, and outline some other possibilities. Of course you should be cautious when I start talking like this ‘cos based on prior form I’m probably really just trying to get money to start a new software development or integration project.

For a start embargoes are really quite different from the sorts of access you want in an authoring system, being time based for a start and they have to work on the wide web rather than the intranet and this stuff happens when the thesis finished, so it really should be out of the document authoring system at that point.

ICE is designed for document authoring and collaboration and only has fairly broad-brush access control. In the courseware we work with access is by a whole team to a whole course. For a thesis it is similarly simple, you have the ability to add stuff and your supervisor can comment. I’m not at all sure that it would make sense to add complex document-level access features to ICE, instead why not concentrate on the ICE templates and conversation system? That is the bit we do better than anyone else that I know of. We could integrate the templates into other systems and leave the business of writing content or document management systems to all the other contenders. (One reason why not is that many CMSs are pretty hopeless at managing multiple renditions of the same content and don’t have plug-in convertors but there must be at least some we could work with.)

I’m not at all sure that ICE itself should be pushed too much further into thesis management past the authoring stage. It certainly looks like it would be good for drafting a thesis and getting your supervisor to comment, and it’s definitely on the cutting edge of being able to mashup document content with data visualization, but we don’t have all the elaborate approval and review steps that you’d need for the internal and external processes that follow. One promising lead springs from the way that the Maths & Computing department here at USQ use the Open Journal Systems (OJS) to manage theses. We’re exploring the idea of an ICE/OJS mashup. Stay tuned for a report from the APSR Open Access Publishing Workshop later this week where I will put this idea and many others to the technical stream.

For delivery I think The Fascinator might be a good way to manage the kind of embargoed access that the TheOREM team have identified as a requirement. ICE could manage the authoring, with examination via something like OJS and then use an ORE resource map to pass the thesis to a departmental or institutional repository running The Fascinator, which IS designed to do access control. It could then re publish the resource map to the world-wide repository grid and manage the embargoes. Either at the examination stage, or at the repository deposit stage the candidate would set up the embargoes, which would need to be kept as metadata against the components of the thesis.

Thinking about this led me to the idea of putting something like The Fascinator on the desktop, letting it find all your stuff, giving you a simple way to organize it into projects, embargo bits of it and so on, and then automate the process of disseminating it to the institutional and other places you’d like it go. I’m thinking of something like Picasa (which finds all your pictures on your hard drive no matter how embarrassing or not safe for work they are) and iTunes which although in my opinion potentially evil has some nice ways of browsing and organizing content, but with a connection to the world wide repository grid. More on this idea soon.


* We’re doing this project more or less in the open so you can poke around and see what we’re up to.

Powered by WordPress