<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>ptsefton</title>
	<atom:link href="http://ptsefton.com/feed" rel="self" type="application/rss+xml" />
	<link>http://ptsefton.com</link>
	<description>This seems to be a workblog</description>
	<lastBuildDate>Tue, 09 Mar 2010 03:12:49 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Back to the wordprocessor</title>
		<link>http://ptsefton.com/2010/03/09/back-to-the-wordprocessor.htm</link>
		<comments>http://ptsefton.com/2010/03/09/back-to-the-wordprocessor.htm#comments</comments>
		<pubDate>Tue, 09 Mar 2010 03:12:49 +0000</pubDate>
		<dc:creator>ptsefton</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://ptsefton.com/2010/03/09/back-to-the-wordprocessor.htm</guid>
		<description><![CDATA[



It&#8217;s been quite a while since I have looked at word processing here. Over the next couple of weeks I&#8217;m going to revisit some of the ongoing themes I cover on this blog about document formats and Scholarly HTML and so on, working with my colleague Ron &#8221;If The Encodings Don&#8217;t Get You The Namespaces [...]]]></description>
			<content:encoded><![CDATA[<abbr class="unapi-id" title="http://ptsefton.com/2010/03/09/back-to-the-wordprocessor.htm"><!-- &nbsp; --></abbr>
<div>
<div class="page-toc" />
<div>
<p>It&#8217;s been quite a while since I have looked at word processing here. Over the next couple of weeks I&#8217;m going to revisit some of the ongoing themes I cover on this blog about document formats and <a href="http://delicious.com/ptsefton/scholarlyhtml">Scholarly HTML</a> and so on, working with my colleague Ron <span class="spCh spChx201d">&#8221;</span>If The Encodings Don&#8217;t Get You The Namespaces Will<span class="spCh spChx201d">&#8221;</span> Ward,  a veteran word processor wrangler, the man who wrote most of the rendering and conversion code in the ICE application. (I&#8217;m still working on the <a href="http://delicious.com/ptsefton/andsmetadatastores">metadata stores</a> thing, too).</p>
<p>So, in no particular order, some things we&#8217;re going to look at.</p>
<ol class="lin" style="list-style: decimal;">
<li>
<p>I&#8217;ll come back to the wonderful world of Custom XML in Word 2007, <a href="http://ptsefton.com/2009/12/23/bye-bye-word-2007-custom-xml.htm">which may be a ex-feature</a>, and the work that Microsoft Research did on various authoring plugins that used it, and try to find out what&#8217;s happening there. We have a visitor coming this week who worked with MS Research on their ontology plugin. More about that soon.</p>
</li>
<li>
<p>I&#8217;m going to revisit the Word 2007 interface, which frankly still frightens me, I dread that process of hunting through weird interface widgets for something I used to know how to do in muscle memory.<span class="footnote" style="vertical-align: super;"><a class="footnote" href="#ftn1" name="ftn1-text" title="1 Look, I'm not afraid of learning new things, why at the moment I'm more or less   learning to play the mandolin which is tuned in 5th s rather than 4th s like most of a guitar &#8211; that's a whole lot of new fingerings, but it makes sense  and it's fun.">1</a></span></p>
</li>
<li>
<p>Interop between Microsoft Word 2007 and Openoffice across .<code>doc</code>, <code>.docx</code> and <code>.odt</code>. This is not just a technical issue, there&#8217;s politics involved. For instance I think, <a href="http://en.wikipedia.org/wiki/Sun_microsystems">Sun</a>&#8217;s evangelical policy used to be <span class="spCh spChx201c">&#8220;</span>Over our dead body will Writer save into that filthy Microsoft .docx format, although we will let you read it in, in order that your documents might be saved.<span class="spCh spChx201d">&#8221;</span> (That&#8217;s <i>saved</i> as in saved from eternal damnation <i>as well as</i> saved in the one true open format). Dear <a href="http://en.wikipedia.org/wiki/Larry_Ellison">Larry</a>, we have the dead body part, can we have Save as <span class="spCh spChx2026">&#8230;</span> .docx now<span class="footnote" style="vertical-align: super;"><a class="footnote" href="#ftn0" name="ftn0-text" title="2 Yes I can save as .docx here in Ubuntu, but that&apos;s a special build with some Novell code in it, I believe.">2</a></span>? </p>
<p>Closer to home there are political issues at USQ with ICE and how it works with some classes of documents. Over in the Faculty of Engineering I gather people are being forced to use ICE for maths-heavy courses, Bron Chandler has been working hard to help the course-maintainers in engineering, to find an acceptable way for them to use Word or Writer plus ICE but there <i>are</i> limits to how far we can push these word processors for heavy duty technical documents. I&#8217;ll say it again: I don&#8217;t think it makes sense to mandate the use of a tool like ICE at a university; we should talk about performance based standards for materials and remain pragmatic; if we can&#8217;t find a cheap way to make HTML versions of maths-heavy materials then we may have to settle for PDF and we may have to let people use LaTeX, or DocBook or whatever (I&#8217;ve never heard of one wanting to use DocBook but there are plenty of LaTeX enthusiasts). </p>
</li>
<li>
<p>Ron is going to look at how we might be able to use Word instead of or in addition to OpenOffice.org in ICE. We picked OpenOffice.org as the main engine-room for ICE several years ago because it is available cross-platform but what if we could do one or more of the following on Windows?</p>
<ul class="lib">
<li>
<p>Use Word to generate PDF <span class="spCh spChx2013">&#8211;</span> should improve the WYSIWYG fidelity (Ron&#8217;s already got this automated, although not hooked in to ICE yet.)</p>
</li>
<li>
<p>Use Word rather than OpenOffice to render the image parts of a page, again to improve the way things look. ICE uses a two-part process to convert to HTML. It converts OpenDocument format XML into HTML, but to get images such as charts or equations, it also calls OpenOffice&#8217;s Save as HTML feature (initial experiments look promising). </p>
</li>
<li>
<p>Maybe replace OpenOffice.org altogether on Windows by automating it to save as .odt behind the scenes. (If that works, there is still the issue of how to make books out of individual course documents. At them moment this I a very complex process which automates Writer, as the master document feature doesn&#8217;t work for what we want.)</p>
</li>
<li>
<p>And if the above fails, maybe rewrite the ICE renderer for Office Open XML rather than Open Document Format. Big job we&#8217;d rather not have to resource.</p>
</li>
</ul>
</li>
</ol>
<p>Along the way, this is going to force me to spend some time in Windows for the first time in a long time, guess I&#8217;ll have a chance to learn about Windows 7.</p>
<p class="center">Copyright Peter Sefton, 2010. Licensed under Creative Commons Attribution-Share Alike 2.5 Australia. &lt;<a href="http://creativecommons.org/licenses/by-sa/2.5/au/">http://creativecommons.org/licenses/by-sa/2.5/au/</a>&gt; </p>
<p class="center"><a href="http://creativecommons.org/licenses/by-sa/2.5/au/" name="HTTP:::DBPEDIA.ORG:SNORQL:?QUERY=SELECT+%3FRESOURCE%0D%0AWHERE+{+%0D%0A%3FRESOURCE+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%2FBIRTHPLACE%3E+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FRESOURCE%2FSYDNEY%3E+%3B%0D%0A%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%"><img alt="HTTP://DBPEDIA.ORG/SNORQL/?QUERY=SELECT+%3FRESOURCE%0D%0AWHERE+{+%0D%0A%3FRESOURCE+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%2FBIRTHPLACE%3E+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FRESOURCE%2FSYDNEY%3E+%3B%0D%0A%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%" class="fr1" height="31" src="http://ptsefton.com/wp-content/uploads/2010/03/m40ca94ba1.png" style="border:0px; vertical-align: top" width="88" /></a></p>
<p class="center">This post was written in OpenOffice.org, using templates and tools provided by the <a href="http://ice.usq.edu.au/">Integrated Content Environment</a> project and published to WordPress using <a href="http://fascinator.usq.edu.au/desktop/desktop.htm">The Fascinator</a>.</p>
<hr />
<div style="font-size: .9em;"><span class="footnote-defined"><a href="#ftn1-text" name="ftn1">1</a> Look, I&#8217;m not afraid of learning new things, why at the moment I&#8217;m more or less   learning to play the mandolin which is tuned in 5<sup>th</sup> s rather than 4<sup>th</sup> s like most of a guitar <span class="spCh spChx2013">&#8211;</span> that&#8217;s a whole lot of new fingerings, but it makes <i>sense</i>  and it&#8217;s fun.</span></div>
</p>
<p>
<div style="font-size: .9em;"><span class="footnote-defined"><a href="#ftn0-text" name="ftn0">2</a> Yes I can save as .docx here in Ubuntu, but that&#8217;s a special build with some Novell code in it, I believe.</span></div>
</p>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://ptsefton.com/2010/03/09/back-to-the-wordprocessor.htm/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>More details on a metdata store for data in/alongside VITAL</title>
		<link>http://ptsefton.com/2010/03/04/more-details-on-a-metdata-store-for-data-inalongside-vital.htm</link>
		<comments>http://ptsefton.com/2010/03/04/more-details-on-a-metdata-store-for-data-inalongside-vital.htm#comments</comments>
		<pubDate>Thu, 04 Mar 2010 00:55:03 +0000</pubDate>
		<dc:creator>ptsefton</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://ptsefton.com/2010/03/04/more-details-on-a-metdata-store-for-data-inalongside-vital.htm</guid>
		<description><![CDATA[



1 Requirements
2 Implementation

2.1 Components


3 Risks



Here&#8217;s another post about the ANDS metadata store work I&#8217;ve been doing. I was at the university of Newcastle last week working with Vicki Picasso and Dave Huthnance, with calls to Teula Morgan at Swinburne. Together we fleshed-out a model for how Newcastle might run a data-registry alongside their VITAL repository.
I [...]]]></description>
			<content:encoded><![CDATA[<abbr class="unapi-id" title="http://ptsefton.com/2010/03/04/more-details-on-a-metdata-store-for-data-inalongside-vital.htm"><!-- &nbsp; --></abbr>
<div>
<div class="page-toc">
<ul>
<li><a href="#id2">1 Requirements</a></li>
<li><a href="#id4">2 Implementation</a>
<ul>
<li><a href="#id6">2.1 Components</a></li>
</ul>
</li>
<li><a href="#id13">3 Risks</a></li>
</ul>
</div>
<div>
<p>Here&#8217;s another post about the <a href="http://delicious.com/ptsefton/ptsefton+andsmetadatastores">ANDS metadata store</a> work I&#8217;ve been doing. I was at the university of Newcastle last week working with Vicki Picasso and Dave Huthnance, with calls to Teula Morgan at Swinburne. Together we fleshed-out a model for how Newcastle might run a data-registry alongside their VITAL repository.</p>
<p>I want to use this blog post to  expand on the last blog post and refine the details of what will become an ANDS project plan for someone to develop this stuff. I&#8217;m going to talk about the <a href="http://ptsefton.com/2010/02/23/ands-metadata-stores-describing-metadata-collections-in-vital.htm">diagram from the last post</a>, in particular the application which has become known as <span class="spCh spChx201c">&#8220;</span>The Red Box<span class="spCh spChx201d">&#8221;</span>.</p>
<p><span style="display: block"><a name="graphics1" /><img alt="graphics1" class="fr2" height="426" src="http://ptsefton.com/wp-content/uploads/2010/03/m117b3490_643x426.jpeg" style="border:0px; vertical-align: top" width="643" /></span></p>
<h1><a id="id2" name="id2" />1 Requirements</h1>
<p>The main requirements for this system, as collected from the stakeholders are that it:</p>
<ol class="lin" style="list-style: decimal;">
<li>
<p>Can provide a <b>university-wide registry of research data,</b> (the data collections will reside wherever they currently reside under access control, and  I believe there is a policy in development that open data will all be served from the research storage service) with two main inputs:</p>
<ul class="lib">
<li>
<p>Form-based deposit, with dead-simple workflow using a system which is as easy for the library and research office to customise as the current VALET system, with an option to port the existing ingest workflows as well (that&#8217;s why there&#8217;s a dotted red line around the current forms system).</p>
</li>
<li>
<p>Discovery of data files and data collections on the university&#8217;s new research storage facility.</p>
</li>
</ul>
</li>
<li>
<p>Work alongside of the VITAL institutional repository by storing metadata (and potentially small collections themselves) in Fedora, which is the underlying storage layer for VITAL</p>
<ul class="lib">
<li>
<p><b>VITAL can be used as the portal for access to research collection metadata or not</b>, as appropriate. Teula Morgan told us that at Swinburne they would likely connect the research data collections into their discovery layer rather than using VITAL. The following models need to be supported:</p>
<ul class="lib">
<li>
<p>VITAL as the authoritative source for digital objects, with RedBox deleting objects which it has handed off to VITAL.</p>
</li>
<li>
<p>RedBox as the authoritative source for digital objects, with VITAL or another portal acting as the discovery interface for research data collections. With the addition of a portal, this configuration could form the basis of a standalone metadata store.</p>
</li>
</ul>
</li>
<li>
<p>The system needs to <b>stay out of VITAL&#8217;s way as much as possible</b> to avoid the risk of unapproved material &#8216;leaking&#8217; through VITAL and to avoid affecting performance.</p>
</li>
<li>
<p><b>The system should provide for batch-changes to repository content</b>, in order to assist in cleaning up the document data that is already in there prior to adding research data. (This component came up both in discussions about how we can work in a linked-data way <span class="spCh spChx2013">&#8211;</span> adding URIs for people mentioned in the repository and as a requirement from Swinburne where they lament the lack of batch-editing tools in VITAL).</p>
</li>
<li>
<p>The system needs to have <b>no dependencies on the proprietary parts of VITAL</b>, to allow the possibility of switching repositories.</p>
</li>
</ul>
</li>
<li>
<p>As far as possible <b>deal only with the collections side of things</b>, without attempting to become management system for activities (research projects) or parties (people). Following from this, to be a <span class="spCh spChx201c">&#8220;</span>Linked Data<span class="spCh spChx201d">&#8221;</span> application where Activities and Parties are referred to by URIs, and other terms also use well-define URIs rather than strings. (There is a potential side-project to this one to upgrade the NicNames system <span class="spCh spChx2013">&#8211;</span> more on that soon).</p>
</li>
</ol>
<p>Caroline Drury the CAIRSS/ANDS liaison person pointed out that we should discuss  how these requirements go beyond encouraging people to register data via the ANDS &#8216;register my data&#8217; service for Research Data Australia. The big things are:</p>
<ol class="lin" style="list-style: decimal;">
<li>
<p>This allows an institution to have additional management metadata that is not in the RDA system, such as details of how long data should be held. We&#8217;re working with Simon Porter from the University of Melbourne on this. The idea as that when this is implemented with University X they will work out the metadata they need and the developer would assist in creating the input forms and mappings from one metadata format to another.</p>
</li>
<li>
<p>This provides for curation by a data librarians who can leave some submissions in the queue while they sort out details of the metadata or wait for data to be made available on the storage facility.</p>
</li>
<li>
<p>This also encompasses data which is not destined for or not ready for listing on RDA.</p>
</li>
<li>
<p>But most importantly it will allow the institution to meet its obligations under <a href="http://ands.org.au/resource/code.html">The Code.</a></p>
</li>
</ol>
<h1><a id="id4" name="id4" />2 Implementation</h1>
<p>The deliverable for the work we&#8217;re doing at USQ for ANDS is an ANDS project plan. That makes sense, as it makes starting up the next phase of work straightforward. But to put forward a complete plan, we need to make some assumptions about the design, mainly in what technologies we would use. So here&#8217;s a proposed broad-brush architecture for a metadata stores solution work with VITAL. Remember this is only a proposal an the reason it&#8217;s up on this blog is so you can comment on it.</p>
<p>The large-scale assumptions are:</p>
<ol class="lin" style="list-style: decimal;">
<li>
<p>RedBox will use <b>Fedora 3 as an internal storage component <span class="spCh spChx2013">&#8211;</span></b> with data synchronised to VITAL as needed. This meets the requirement that VITAL is functioning as a portal and will reduce stress on the VITAL repository as much as possible. Fedora is an obvious choice for ARROW/VITAL sites. We are choosing to work with the latest version for the RedBox component, but it will need to synchronise with Fedora 2 which underpins the VITAL product.</p>
</li>
<li>
<p>The application will be<b> developed in Java,</b> building on The Fascinator platform which was originally sponsored by the ARROW project in 2008 and which has been under development at USQ since then. Benefits include:</p>
<ul class="lib">
<li>
<p>Being Java it can sit in the same Tomcat web-server as Fedora and the Apache Solr indexer used by most Fedora repositories these days.</p>
</li>
<li>
<p>The ingest component we&#8217;re developing for The Fascinator, while incomplete, meets the requirement that the system be configurable in a similar way to VALET <span class="spCh spChx2013">&#8211;</span> where extending the forms and integrating them with external systems like CrossRef is trivially easy. Existing VALET forms can also be ported to the new system (it&#8217;s a manual process, but not difficult).</p>
</li>
<li>
<p>It&#8217;s highly modular and so can be used, for example, without a portal, a role which VITAL, or a discovery service can take on. One of the most important modules will be plugin harvest technology to pick up content that&#8217;s on the storage system and provide a view to researchers and data librarians to begin describing it. The Fascinator has as extensible system of plugins and we already have file-indexers and a framework for extracting metadata from files, which can be extended to work with new kinds of research data as they appear.</p>
</li>
<li>
<p>Our developers know the system, meaning we can be up and running with this application very quickly.</p>
</li>
</ul>
<p>We are aware of other Fedora software components, but none that meet all of the above criteria. The closest would probably be Muradora <span class="spCh spChx2013">&#8211;</span> if there are ANDS contributors using that who want help setting it for research data then I think that would be in scope for us to look at in our ANDS work,</p>
</li>
<li>
<p>While there was some discussion about using VALET or VITAL as the foundation for the RedBox early in the project neither of these systems has an architecture which can work on a university-wide scale if all data sets are to be described, or the ability to harvest metadata about files on the research storage service.</p>
</li>
</ol>
<h2><a id="id6" name="id6" />2.1 Components</h2>
<p>I&#8217;ll take a brief look at some of the components in turn.</p>
<h3><a id="id7" name="id7" />2.1.1 Batch editing</h3>
<p>The Fascinator already has the basis for a batch-change tool. The Fascinator,  whether it is sitting on top of Fedora or our simple file back-end has indexer component which can either watch a queue for changes in a repository or do a complete re-index. </p>
<p>This indexer has an extensible set of rules which are small scripts that can be fired off to deal with various kinds of content. In our desktop work we use this for things like generating web-ready versions of video content and HTML versions of documents, but it could also be used to transform datastreams in the repository to make bulk changes.</p>
<p>Here&#8217;s an example of how it might be used. I talked previously about setting up a system with the NLA or locally using NicNames to assign unique IDs to researchers before we start the major part of this project. Once we had a set of name identities, they could be put back into the repository by writing an &#8216;indexing&#8217; rule that for each document does a look up to the name authority system and puts it back into the MARCXML datastream in an appropriate field. Then you tell the system to re-index.</p>
<p>For a batch edit, you&#8217;d back everything up, then set this up and run it on a copy of the repository and swap in the new data and test thoroughly with VITAL, before either running the process on the live repository or swapping in a new Fedora database underneath.</p>
<h3><a id="id8" name="id8" />2.1.2 OAI-PMH feeds</h3>
<p>One of the key deliverables for this project is to get data flowing to <a href="http://services.ands.org.au/home/orca/rda/">Research Data Australia</a>. Part of the project scope will be to make sure that we have fully-functional OAI-PMH feeds, complete with support for deletions, working from both VITAL&#8217;s copy of Fedora and from the RedBox itself to meet the requirement that the system is able to run in <span class="spCh spChx201c">&#8220;</span>headless<span class="spCh spChx201d">&#8221;</span> mode. Xiaobin Shen from ANDS in Melbourne has been working with OAI-PMH providers, so we don&#8217;t expect this part to be hard., just a matter of careful selection, configuration and testing.</p>
<h3><a id="id9" name="id9" />2.1.3 VITAL configuration</h3>
<p>One of the major tasks is to create customisations for VITAL to display metadata records about data and make sure that the result fits in to the rest of the VITAL repository portal.</p>
<h3><a id="id10" name="id10" />2.1.4 The Forms interface</h3>
<p>While we&#8217;re not proposing to use the Squire forms interface sponsored by ARROW directly, the code we will be using is based on and informed by the same VALET model. This system consists of:</p>
<ul class="lib">
<li>
<p>Form-templates using the same Velocity template engine for Java as VITAL uses, to make this easy to deploy for ARROW sites.</p>
</li>
<li>
<p>Form widgets that allow for linked-data lookups. The idea will be to use sources such as People Australia or a NicNames instance to provide a URI for a name; type in <span class="spCh spChx201c">&#8220;</span>John Smith<span class="spCh spChx201d">&#8221;</span> and it will give you a list of John Smiths and the areas in which they work to pick from, with similar lookups for research projects. Where there is no suitable John Smith then the forms application will create a new one with a temporary URI via a call to the name-authority.</p>
<p>Another thing the ingest system will need is a link to the Identify My Data service or a local equivalent for those sites who are sold on the benefits of using Handles as persistent identifiers for their collections<span class="footnote" style="vertical-align: super;"><a class="footnote" href="#ftn0" name="ftn0-text" title="1 I&apos;m not convinced that using the ANDS handle infrastructure is a good idea. Firstly, ANDS is only funded for a short time and while ANDS staff express their aspirations for keeping things going there are no guarantees, secondly, data has to have URL pointing to where it is stored, and that has to be maintained as well, with redirects and so when things change. In a lot of use-cases I think that using Handles just increases complexity, cost and risk.">1</a></span>.</p>
</li>
<li>
<p>Simple workflow descriptions like the following sample:</p>
</li>
</ul>
<pre><span style="font-size:9.75pt; font-style:normal; font-weight:normal; "><span class="T2">"stages": [</span></span></pre>
<pre><span style="font-size:9.75pt; font-style:normal; font-weight:normal; "><span class="T2">    {</span></span></pre>
<pre>&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; <span style="font-size:9.75pt; font-style:normal; font-weight:normal; "><span class="T2">"name": "init",</span></span></pre>
<pre>&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; <span style="font-size:9.75pt; font-style:normal; font-weight:normal; "><span class="T2">"security": ["owner", "admin"],</span></span></pre>
<pre>&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; <span style="font-size:9.75pt; font-style:normal; font-weight:normal; "><span class="T2">"visibility": ["guest", "owner", "admin"],</span></span></pre>
<pre>&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; <span style="font-size:9.75pt; font-style:normal; font-weight:normal; "><span class="T2">"template": "${user.home}/.fascinator/workflows/templates/basic-init.vm"</span></span></pre>
<pre>&#160;&#160;&#160;&#160;&#160;&#160;&#160; <span style="font-size:9.75pt; font-style:normal; font-weight:normal; "><span class="T2">},</span></span></pre>
<pre>&#160;&#160;&#160;&#160;&#160;&#160;&#160; <span style="font-size:9.75pt; font-style:normal; font-weight:normal; "><span class="T2">{</span></span></pre>
<pre>&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; <span style="font-size:9.75pt; font-style:normal; font-weight:normal; "><span class="T2">"name": "live",</span></span></pre>
<pre>&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; <span style="font-size:9.75pt; font-style:normal; font-weight:normal; "><span class="T2">"security": ["owner", "admin"],</span></span></pre>
<pre>&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; <span style="font-size:9.75pt; font-style:normal; font-weight:normal; "><span class="T2">"visibility": ["guest", "owner", "admin"],</span></span></pre>
<pre>&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; <span style="font-size:9.75pt; font-style:normal; font-weight:normal; "><span class="T2">"template": "${user.home}/.fascinator/workflows/templates/basic-live.vm"</span></span></pre>
<pre>&#160;&#160;&#160;&#160;&#160;&#160;&#160; <span style="font-size:9.75pt; font-style:normal; font-weight:normal; "><span class="T2">}</span></span></pre>
<pre>&#160;&#160;&#160; <span style="font-size:9.75pt; font-style:normal; font-weight:normal; "><span class="T2">]</span></span></pre>
<h3><a id="id12" name="id12" />2.1.5 Research storage harvest &amp; grants database triggers</h3>
<p>The University of Newcastle has a research storage facility (a <a href="http://en.wikipedia.org/wiki/Storage_area_network">SAN</a>) and policies are under development which will likely see lots of RDA-ready data made available on the facility. One of the features of the RedBox application will be to he able to harvest metadata about such files; at the very least, file-paths, sizes and dates, and any other metadata that can be automatically extracted. For data that&#8217;s on the storage facility the idea is that researchers can add it to the metadata store by finding the data files themselves and clicking <span class="spCh spChx201c">&#8220;</span>Describe this data<span class="spCh spChx201d">&#8221;</span> to fill out a form.</p>
<p>The Fascinator has a number of features in this area such as the ability to generate thumbnail versions of files and web-ready renditions of various formats, which is of peripheral interest to the metadata stores activity.</p>
<p>The harvester system is highly configurable, so it could be set up to index, or watch other kinds of storage service; one that Newcastle require is the ability to harvest an email account with messages from the grants database about grant completions; which are events that should trigger a metadata librarian to chase-up data from researchers.</p>
<h1><a id="id13" name="id13" />3 Risks</h1>
<p>There will be a detailed risk assessment for ANDS, but here are some notes about the main risks:</p>
<div class="Risks" style="width: 100%; margin: 0px; padding: 0px; text-align:left;">
<table class="Risks" style="border-spacing: 0;empty-cells: show; width:16.999cm; border-collapse: collapse; border: 1.0px solid #000000">
<colgroup>
<col style="width:8.5cm;" /></colgroup>
<tbody>
<tr>
<td class="Risks_A1" style="vertical-align: top;  border-bottom:1.0px solid #000000;  border-left:1.0px solid #000000;  border-right:none;  border-top:1.0px solid #000000;  padding:0.097cm; ">
<p class="P2">Risk</p>
</td>
<td class="Risks_B1" style="vertical-align: top;  border:1.0px solid #000000;  padding:0.097cm; ">
<p class="P2">Mitigation Plan</p>
</td>
</tr>
<tr>
<td class="Risks_A2" style="vertical-align: top;  border-bottom:1.0px solid #000000;  border-left:1.0px solid #000000;  border-right:none;  border-top:none;  padding:0.097cm; ">
<p>The software we develop here might end up being only used at one or a handful of institutions, which would then bear the maintenance load. </p>
<p /></td>
<td class="Risks_B2" style="vertical-align: top;  border-bottom:1.0px solid #000000;  border-left:1.0px solid #000000;  border-right:1.0px solid #000000;  border-top:none;  padding:0.097cm; ">
<ul class="lib">
<li>
<p>Work to promote this solution to other ARROW sites.</p>
</li>
<li>
<p>Release all code as open source with tested documentation.</p>
</li>
<li>
<p>Use components of this solution as part of the standalone solution we have also been asked to look at, broadening the installed base for the the RedBox application.</p>
</li>
<li>
<p>Consider funding a program to port VALET workflows to the new system for ARROW sites to build a sustainable community.</p>
</li>
<li>
<p>Work with the University of Melbourne and collaborators to see if some of the component developed for the RedBox application could be used at their sites, and to ensure compatibility with VITRO as a data store.</p>
</li>
<li>
<p>Document the metadata storage system and batch transformation system so that new Fedora-compatible ingest or portal tools can be swapped-in later.</p>
</li>
</ul>
</td>
</tr>
<tr>
<td class="Risks_A2" style="vertical-align: top;  border-bottom:1.0px solid #000000;  border-left:1.0px solid #000000;  border-right:none;  border-top:none;  padding:0.097cm; ">
<p>University X unable to supply full metadata schema on time; there is no clear consensus on best practice for describing research data collection to meet the demands of The Code, while being able to serve RIF-CS to Research Data Australia.</p>
</td>
<td class="Risks_B2" style="vertical-align: top;  border-bottom:1.0px solid #000000;  border-left:1.0px solid #000000;  border-right:1.0px solid #000000;  border-top:none;  padding:0.097cm; ">
<ul class="lib">
<li>
<p>Stakeholders to vigorously encourage ANDS to produce metadata guides, possibly after workshops.</p>
</li>
<li>
<p>If all else fails implement RIF-CS in the repository., with the possibility of doing a batch-update later.</p>
</li>
</ul>
</td>
</tr>
<tr>
<td class="Risks_A2" style="vertical-align: top;  border-bottom:1.0px solid #000000;  border-left:1.0px solid #000000;  border-right:none;  border-top:none;  padding:0.097cm; ">
<p>Linked Data infrastructure including URI endpoints for parties and activities doesn&#8217;t come on-line in time for implementation</p>
</td>
<td class="Risks_B2" style="vertical-align: top;  border-bottom:1.0px solid #000000;  border-left:1.0px solid #000000;  border-right:1.0px solid #000000;  border-top:none;  padding:0.097cm; ">
<ul class="lib">
<li>
<p>Work with the PIP project to make sure that this does not happen (at least for names) possibly entering name data into People Australia semi-manually.</p>
</li>
<li>
<p>Work with ANDS to set up interim systems where possible.</p>
</li>
<li>
<p>If all else fails store strings in metadata fields just as we have been doing in IRs for years and update to URIs later when the infrastructure is available.</p>
</li>
</ul>
</td>
</tr>
</tbody>
</table>
</div>
<p />
<ol class="lin" style="list-style: decimal;">
<li>
<p /></li>
</ol>
<p />
<p class="center">Copyright Peter Sefton, 2010. Licensed under Creative Commons Attribution-Share Alike 2.5 Australia. &lt;<a href="http://creativecommons.org/licenses/by-sa/2.5/au/">http://creativecommons.org/licenses/by-sa/2.5/au/</a>&gt; </p>
<p class="center"><a href="http://creativecommons.org/licenses/by-sa/2.5/au/" name="HTTP:::DBPEDIA.ORG:SNORQL:?QUERY=SELECT+%3FRESOURCE%0D%0AWHERE+{+%0D%0A%3FRESOURCE+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%2FBIRTHPLACE%3E+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FRESOURCE%2FSYDNEY%3E+%3B%0D%0A%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%"><img alt="HTTP://DBPEDIA.ORG/SNORQL/?QUERY=SELECT+%3FRESOURCE%0D%0AWHERE+{+%0D%0A%3FRESOURCE+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%2FBIRTHPLACE%3E+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FRESOURCE%2FSYDNEY%3E+%3B%0D%0A%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%" class="fr1" height="31" src="http://ptsefton.com/wp-content/uploads/2010/03/m40ca94ba.png" style="border:0px; vertical-align: top" width="88" /></a></p>
<p class="center">This post was written in OpenOffice.org, using templates and tools provided by the <a href="http://ice.usq.edu.au/">Integrated Content Environment</a> project and published to WordPress using <a href="http://fascinator.usq.edu.au/desktop/desktop.htm">The Fascinator</a>.</p>
<hr />
<div style="font-size: .9em;"><span class="footnote-defined"><a href="#ftn0-text" name="ftn0">1</a> I&#8217;m not convinced that using the ANDS handle infrastructure is a good idea. Firstly, ANDS is only funded for a short time and while ANDS staff express their aspirations for keeping things going there are no guarantees, secondly, data has to have URL pointing to where it is stored, and that has to be maintained as well, with redirects and so when things change. In a lot of use-cases I think that using Handles just increases complexity, cost and risk.</span></div>
</p>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://ptsefton.com/2010/03/04/more-details-on-a-metdata-store-for-data-inalongside-vital.htm/feed</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>ANDS metadata stores: Describing metadata collections in VITAL</title>
		<link>http://ptsefton.com/2010/02/23/ands-metadata-stores-describing-metadata-collections-in-vital.htm</link>
		<comments>http://ptsefton.com/2010/02/23/ands-metadata-stores-describing-metadata-collections-in-vital.htm#comments</comments>
		<pubDate>Tue, 23 Feb 2010 08:03:54 +0000</pubDate>
		<dc:creator>ptsefton</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://ptsefton.com/2010/02/23/ands-metadata-stores-describing-metadata-collections-in-vital.htm</guid>
		<description><![CDATA[



The new model
Going further



I&#8217;m at the University of Newcastle visiting repository rat extraordinaire Vicki Picasso (actually at this bushland campus she should be a repository wallaby or something) and her colleague Dave Huthnance from IT. We are working on a model for how research data collections destined for Research Data Australia might be described and [...]]]></description>
			<content:encoded><![CDATA[<abbr class="unapi-id" title="http://ptsefton.com/2010/02/23/ands-metadata-stores-describing-metadata-collections-in-vital.htm"><!-- &nbsp; --></abbr>
<div>
<div class="page-toc">
<ul>
<li><a href="#id2">The new model</a></li>
<li><a href="#id4">Going further</a></li>
</ul>
</div>
<div>
<p>I&#8217;m at the University of Newcastle visiting repository rat extraordinaire Vicki Picasso (actually at this bushland campus she should be a repository wallaby or something) and her colleague Dave Huthnance from IT. We are working on a model for how research data collections destined for Research Data Australia might be described and managed in the local institutional repository.</p>
<p>(Please ANDS can we have some advice on this metadata issue? Some of you say to use RIF-CS and some say that&#8217;s a bad idea.)</p>
<p>Vicki presented a model of how metadata about research data could be ingested into the VITAL repository they use at Newcastle at eResearch Australasia 2009; it featured the VALET system which is a very simple repository ingest tool and what Vicki calls <span class="spCh spChx201c">&#8220;</span>Institutional Data Triggers<span class="spCh spChx201d">&#8221;</span> such as events in a grants database which would fire-off a metadata ingest workflow.</p>
<h1><a id="id2" name="id2" />The new model</h1>
<p>Today we refined that diagram. Like it says the <b>blue bits</b> represent the current Nova repository infrastructure (VITAL + VALET + Fedora) which feeds data to (amongst other things) the National Library&#8217;s harvesting systems. </p>
<p><b>The red bits </b>are new proposed infrastructure, to be developed, to enable collections metadata to be captured and feeds of RIF-CS metadata to Research Data Australia. The new red box labelled <span class="spCh spChx201c">&#8220;</span>Research Data Collections<span class="spCh spChx201d">&#8221;</span>, should it be built, will be a more sophisticated version of VALET, probably written in Java so it can work in the same Tomcat web container as Fedora <span class="spCh spChx2013">&#8211;</span> it would have a VALET-style simple forms interface for walk-up submissions (this <i style="text-decoration: underline; "><span>could</span></i> be used to replace the existing publications ingest and staging workflows too, as shown by the dotted red line <span class="spCh spChx2013">&#8211;</span> if this were a requirement).</p>
<p><b>Green is for external services.</b> One of the very interesting green bits is the Research Storage system which is being provided by university IT and administered by the Research Office. I gather that this is essentially a file-store; we are proposing to add an interface that lets researchers see their files in the new (red) ingest system and add metadata to them, and flag them as candidates for RDA. I think Newcastle&#8217;s policy will be  that if you want data to be available via Research Data Australia then it is desirable this it goes in the Research Storage System. Sounds good to me. To bridge the gap between files on a storage system we are proposing a bit of middleware to link the file view of data to a web/repository view.</p>
<p>As discussed before here, the ANDS stakeholders in this project are keen for us to take a linked-data approach to metadata (slogan: Less typing, more linking!). I talked a bit about how this might work <a href="http://ptsefton.com/2010/02/23/ands-metadata-stores-integrating-vital-with-the-nlas-party-infrastructure-project.htm">in  the previous post on name identities</a>; potential integration with services like the NLA&#8217;s PIP/People Australia and possible services like an ARC website for grant information are shown in green at the bottom right of the diagram (I have some input from Basil at the NLA I need to process, but at this stage I think we&#8217;re looking at having <a href="http://nicnamesproject.blogspot.com/">NicNames</a> in there so institutions can manage their own metadata.</p>
<p />
<p />
<p><span style="display: block"><a name="graphics1" /><img alt="graphics1" class="fr2" height="426" src="http://ptsefton.com/wp-content/uploads/2010/02/m117b3490_643x426.jpeg" style="border:0px; vertical-align: top" width="643" /></span></p>
<p />
<p>One assumption we&#8217;re making here is that the core class of item we&#8217;re describing here is a collection, which should fit with the kind of data that is already in the repository, which is <b>research outputs</b>, like data.</p>
<p>There are some questions, of course. </p>
<ol class="lin" style="list-style: decimal;">
<li>
<p>What metadata schema to use for describing data collections?</p>
</li>
<li>
<p>And where would the ISO2146 notion of Services fit in? The services listed in the RIF-CS documentation are all repository-type search/feed services so it seems appropriate to either tie them in to the OAI-PMH &#8216;identify&#8217; verb or to let repository managers simply enter them in to an ANDS system directly.</p>
</li>
</ol>
<h1><a id="id4" name="id4" />Going further</h1>
<p class="P1">One idea that has come up is that VITAL sites might want to use Fedora and the OAI-PMH feeds available off it but not expose them via a web portal at all. In conversation with Teula Morgan from Swinburne today, Vicki proposed a model where there is no portal interface.  I call this a &#8216;headless&#8217; approach; there would be  local management interface for research data collection metadata (the red box) but it could be that the primary discovery mechanism is outsourced to RDA. This is pretty common for university web sites <span class="spCh spChx2013">&#8211;</span> USQ uses Google for our website search service for example.</p>
<p class="P1">I am also exploring the idea that this ingest tool, which will be able to put records into Fedora (which as far as I know nobody has ever been fired for acquiring) could form the basis for our major deliverable on our ANDS metadata stores project; a specification for a stand-alone metadata -about-research-data-collection system.</p>
<p class="P1" />
<p class="center">Copyright Peter Sefton, 2010. Licensed under Creative Commons Attribution-Share Alike 2.5 Australia. &lt;<a href="http://creativecommons.org/licenses/by-sa/2.5/au/">http://creativecommons.org/licenses/by-sa/2.5/au/</a>&gt; </p>
<p class="center"><a href="http://creativecommons.org/licenses/by-sa/2.5/au/" name="HTTP:::DBPEDIA.ORG:SNORQL:?QUERY=SELECT+%3FRESOURCE%0D%0AWHERE+{+%0D%0A%3FRESOURCE+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%2FBIRTHPLACE%3E+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FRESOURCE%2FSYDNEY%3E+%3B%0D%0A%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%"><img alt="HTTP://DBPEDIA.ORG/SNORQL/?QUERY=SELECT+%3FRESOURCE%0D%0AWHERE+{+%0D%0A%3FRESOURCE+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%2FBIRTHPLACE%3E+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FRESOURCE%2FSYDNEY%3E+%3B%0D%0A%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%" class="fr1" height="31" src="http://ptsefton.com/wp-content/uploads/2010/02/m40ca94ba2.png" style="border:0px; vertical-align: top" width="88" /></a></p>
<p class="center">This post was written in OpenOffice.org, using templates and tools provided by the <a href="http://ice.usq.edu.au/">Integrated Content Environment</a> project and published to WordPress using <a href="http://fascinator.usq.edu.au/desktop/desktop.htm">The Fascinator</a>.</p>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://ptsefton.com/2010/02/23/ands-metadata-stores-describing-metadata-collections-in-vital.htm/feed</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>ANDS Metadata Stores: Integrating VITAL with the NLA&apos;s Party Infrastructure Project</title>
		<link>http://ptsefton.com/2010/02/23/ands-metadata-stores-integrating-vital-with-the-nlas-party-infrastructure-project.htm</link>
		<comments>http://ptsefton.com/2010/02/23/ands-metadata-stores-integrating-vital-with-the-nlas-party-infrastructure-project.htm#comments</comments>
		<pubDate>Tue, 23 Feb 2010 02:28:27 +0000</pubDate>
		<dc:creator>ptsefton</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://ptsefton.com/2010/02/23/ands-metadata-stores-integrating-vital-with-the-nlas-party-infrastructure-project.htm</guid>
		<description><![CDATA[



Why is this important?
How?
Linked data more generally



Here is some more news on the metadata stores project for ANDS (see previous posts) and how we might build links between VITAL/Fedora and an identity service for people (parties in ANDS-speak). This is potentially a step on the way to a linked-data future not just for research data [...]]]></description>
			<content:encoded><![CDATA[<abbr class="unapi-id" title="http://ptsefton.com/2010/02/23/ands-metadata-stores-integrating-vital-with-the-nlas-party-infrastructure-project.htm"><!-- &nbsp; --></abbr>
<div>
<div class="page-toc">
<ul>
<li><a href="#id2">Why is this important?</a></li>
<li><a href="#id4">How?</a></li>
<li><a href="#id5">Linked data more generally</a></li>
</ul>
</div>
<div>
<p>Here is some more news on the metadata stores project for ANDS (see <a href="http://delicious.com/ptsefton/andsmetadatastores">previous posts</a>) and how we might build links between VITAL/Fedora and an identity service for people (parties in ANDS-speak). This is potentially a step on the way to a linked-data future not just for research data but also for institutional repositories. </p>
<p>Natasha Simons said, on the CAIRSS list:</p>
<blockquote class="bq"><p>An exciting new project, which was mentioned at the CAIRSS Community Day last December, has commenced at the National Library of Australia (NLA). The Australian Research Data Commons Party Infrastructure Project (ARDCPIP) is an Australian National Data Service (ANDS) funded project being carried out&#160; by the NLA to enable improved discovery of research outputs and data through assigning public and persistent identifiers to Australian researchers and research organisations using the NLA<span class="spCh spChx2019">&#8217;</span>s People Australia Infrastructure. The project will involve consultation with the research sector and more information about it will be made available to you shortly.&#160;</p>
</blockquote>
<p>This is good news for the metadata stores work we&#8217;re doing with ANDS. I was in Canberra in early February and we caught up with Basil Dewhurst and Natasha; seems like the timing is good to try to get a link-up between the work we&#8217;re doing to sketch the design of a metadata store for data on top of the VITAL repository.</p>
<h1><a id="id2" name="id2" />Why is this important?</h1>
<p>One of the problems we have been talking about with ANDS is how to represent people. In VITAL names are typically not subject to authority control, and even if they are it is via some kind of people-driven workflow, not a feature of the software. (Any exceptions to this? Please let me know in the comments.) </p>
<p>The linked-data semantic web ideal would be for there to be some kind of resolvable identifier stored each time a party is mentioned. You want at least two bits of data (using the example that Basil gave in the comments here last time).</p>
<ol class="lin" style="list-style: decimal;">
<li>
<p>The name as it appears on a particular work that&#8217;s a string, like Cappo, Michael Charles.</p>
</li>
<li>
<p>A URI <span class="spCh spChx2013">&#8211;</span> such as  <a href="http://nla.gov.au/nla.party-471077">http://nla.gov.au/nla.party-471077</a> </p>
</li>
<li>
<p>And maybe a canonical, normalized way of referring to the name for a particular service <span class="spCh spChx2013">&#8211;</span> the heading at the NLA&#8217;s trove site is Cappo, Michael.</p>
</li>
<li>
<p>Possibly more URIs that refer to the same party.</p>
</li>
<li>
<p>If we had this it would meant that submitting data about data collections to the Australian Research Data Commons would be much more concise and accurate <span class="spCh spChx2013">&#8211;</span> it should not be necessary to include any party data at all in a feed from a metadata store to the commons, just URIs.</p>
</li>
<li>
<p>But, in the case of University X, not all the researchers have persistent identifiers. So, I have suggested that we try to create persistent identifiers for all parties as the first step in building a metadata store on top of VITAL. This is good several ways:</p>
</li>
<li>
<p>Having an early adopter on board will help the PIP project at the NLA.</p>
</li>
<li>
<p>This will bring together three ANDS funded projects and promote general metadata harmony.</p>
</li>
<li>
<p>University X will get a significant upgrade to the integrity and usefulness of their existing data.</p>
</li>
</ol>
<h1><a id="id4" name="id4" />How?</h1>
<p>What we&#8217;re looking at is a pre-load stage to populate PIP/People Australia with the identities of people already mentioned in the repository.</p>
<ol class="lin" style="list-style: decimal;">
<li>
<p><b>Harvest name data from VITAL </b>- could be via MARC / DC / or some other format derived from those, over OAI-PMH or via some other batch load. Contextual information is likely to be limited to subject and affiliation data pretty much &#8211; you won&#8217;t get birthdates. Vicki Picasso tells me that at Newcastle they record which authors belong to their institution, so for that institution this process could concentrate on the authors they know are theirs. </p>
</li>
<li>
<p><b>Records are auto-matched in&#160;</b><b style="background-color:#5485bd; "><span>PIP</span></b> using whatever algorithms it has.</p>
</li>
<li>
<p>A person (probably at University X) uses the NLA&#8217;s private People Australia tools to s<b>ort out the name&#160;references and associate the various publications with the right person ID.</b></p>
</li>
<li>
<p>Once the name data has been checked (step 3) the developer writes some code to<b> inject the ID data back into Fedora</b>, in MARC (if that&#8217;s possible) or into a new datastream. (If my team got the job we would do this by writing some custom index rules for our Fedora indexer which is part of modular system that is <a href="http://fascinator.usq.edu.au/">The Fascinator</a>.) </p>
</li>
<li>
<p>The developer creates an ingest system (like VALET).  As part of the form-filling process it would look up name IDs at point of ingest and trigger creation of new parties when one is not found.  We will write up more detail about how this would look, using Newcastle as an example.</p>
</li>
</ol>
<p class="P1"><span style="color:#000000; font-size:9.75pt; font-style:normal; font-variant:normal; font-weight:normal; letter-spacing:normal; text-transform:none; "><span class="T1"><br /></span></span>There is a variant on this where NicNames might be used in between 1 &amp; 2 and&#160;PIP would be fed cleaner data.</p>
<p class="P1">Once the intial load was done, many names associated with data collections would already have URIs so these would not have to be re entered; for &#8216;new&#8217; identities there would have to be some kind of system for minting a new ID then flagging it for attention from a name-curator at a later time.</p>
<h1><a id="id5" name="id5" />Linked data more generally</h1>
<p>I have described a process here for party identities; it would also be good to have the same kind of approach for activities (in the ISO-2146 sense) which means<b> research projects </b>in this context. Andrew Treloar of ANDS Monash branch tells me that there has been talk of having the major funding bodies like ARC provide a PIP-like service for grant funded projects, providing some kind of URI that a human or a machine could look up, getting a web page or an RDF record as appropriate. The RDF record could be used to extract trusted data like project names at both the repository and the ARDC registry but nobody would have to re-type that data into a form.</p>
<p> Does anyone out there know anything about this potential service for Activities? Can you comment?</p>
<p class="center">Copyright Peter Sefton, 2010. Licensed under Creative Commons Attribution-Share Alike 2.5 Australia. &lt;<a href="http://creativecommons.org/licenses/by-sa/2.5/au/">http://creativecommons.org/licenses/by-sa/2.5/au/</a>&gt; </p>
<p class="center"><a href="http://creativecommons.org/licenses/by-sa/2.5/au/" name="HTTP:::DBPEDIA.ORG:SNORQL:?QUERY=SELECT+%3FRESOURCE%0D%0AWHERE+{+%0D%0A%3FRESOURCE+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%2FBIRTHPLACE%3E+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FRESOURCE%2FSYDNEY%3E+%3B%0D%0A%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%"><img alt="HTTP://DBPEDIA.ORG/SNORQL/?QUERY=SELECT+%3FRESOURCE%0D%0AWHERE+{+%0D%0A%3FRESOURCE+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%2FBIRTHPLACE%3E+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FRESOURCE%2FSYDNEY%3E+%3B%0D%0A%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%" class="fr1" height="31" src="http://ptsefton.com/wp-content/uploads/2010/02/m40ca94ba1.png" style="border:0px; vertical-align: top" width="88" /></a></p>
<p class="center">This post was written in OpenOffice.org, using templates and tools provided by the <a href="http://ice.usq.edu.au/">Integrated Content Environment</a> project and published to WordPress using <a href="http://fascinator.usq.edu.au/desktop/desktop.htm">The Fascinator</a>.</p>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://ptsefton.com/2010/02/23/ands-metadata-stores-integrating-vital-with-the-nlas-party-infrastructure-project.htm/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ANDS Metadata store: starting point</title>
		<link>http://ptsefton.com/2010/02/02/ands-metadata-store-starting-point.htm</link>
		<comments>http://ptsefton.com/2010/02/02/ands-metadata-store-starting-point.htm#comments</comments>
		<pubDate>Tue, 02 Feb 2010 05:41:07 +0000</pubDate>
		<dc:creator>ptsefton</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://ptsefton.com/2010/02/02/ands-metadata-store-starting-point.htm</guid>
		<description><![CDATA[



Starting point
Issues



This is the second post about the ANDS-funded metadata store work we&#8217;re doing at ADFI. The project now has a Trac site where we will be tracking progress and keeping notes; the site will be open to the public to read, to make it as easy as possible to reach a wide range of [...]]]></description>
			<content:encoded><![CDATA[<abbr class="unapi-id" title="http://ptsefton.com/2010/02/02/ands-metadata-store-starting-point.htm"><!-- &nbsp; --></abbr>
<div>
<div class="page-toc">
<ul>
<li><a href="#id2">Starting point</a></li>
<li><a href="#id3">Issues</a></li>
</ul>
</div>
<div>
<p class="P3">This is the second post about the ANDS-funded metadata store work we&#8217;re doing at ADFI. The project now has a <a href="https://204.236.227.98/projects/metadatastore/trac/wiki/">Trac site</a> where we will be tracking progress and keeping notes; the site will be open to the public to read, to make it as easy as possible to reach a wide range of stakeholders, although there will be a few documents we have to keep under tighter control. The Trac site will be mainly used as a project wiki <span class="spCh spChx2013">&#8211;</span> but I will put in some job tickets and use the milestones feature to track what happens between our (mostly) weekly project meetings. <a href="https://204.236.227.98/projects/metadatastore/trac/milestone/2010-02-08">This week&#8217;s milestone is up now.</a></p>
<p class="P3">In this post I&#8217;ll reveal some detail about our starting point (though not the name of the institution(s) we&#8217;ll be working with) and follow up on some of the feedback I got from ANDS staffers to my <a href="http://ptsefton.com/2010/01/15/new-project-metadata-for-data-collections.htm">previous post</a>. Scott Yeadon raised quite a few points, some of which I will get to in future posts and/or project plans and wiki pages.</p>
<h1><a id="id2" name="id2" />Starting point</h1>
<p>ADFI staff met with ANDS stakeholders late in 2009, and we have agreed that a good starting point will be to focus on one of the &#8216;additional&#8217; deliverables first.  The core deliverable is a project plan to build a stand- alone metadata store, with an option to write extra plans for add-ons or customisations to existing repository software such as DSpace or Eprints should any organisations want to keep metadata about data collections in their IR. It happens that there has been a fair bit of work at least one institution (University X)  in Australia where they do plan to keep metadata about collections (and maybe parties, and so on) in their IR; they are running the VITAL software which was associated with the ARROW project. So that&#8217;s our starting point: a project plan to write some  open source software and supply configuration files and customisation, and documentation for an ARROW repository so that University X and other sites running the ARROW suite of software can participate in the data commons, and submit data to research data Australia via their IR.</p>
<p>The VITAL software is a web interface to Fedora, with configuration to index a Fedora repository and display. You can <a href="http://arrow.monash.edu.au/vital/access/manager/Index">see it in action</a> at the home of ARROW, Monash University. Some sites use bit of free software called VALET to put things in to the repository. </p>
<p>VALET has a simple design which I like <span class="spCh spChx2013">&#8211;</span> it allows you to design a form or forms to capture as much metadata as you like and configure a set of really simple approval steps. When a user starts adding metadata about a new object, the system saves the in-progress data by the simple expedient of serialising the form data to XML. Moving to the next step of a workflow just requires the application to put the saved data back into the form. When a user with the correct rights adds the item to the repository, pre-configured XSLT stylesheets run automatically to transform the serialised form data to whatever is required, usually MARCXML and Dublin Core.</p>
<p>The ARROW project sponsored a replacement for VALET called Squire which <a href="http://ptsefton.com/2008/07/31/improving-valet-part-2.htm">I reported on in 2008</a>, but so far nobody has used it in anger. For this project I think it might be good to use Squire, or a something like it rather than VALET; we&#8217;re discussing the pros and cons with ANDS and University X.</p>
<p>Over the next week I will be putting together a skeletal draft of a project plan based around the proposed architecture ready for stakeholders at ANDS and the IR and eResearch communities to comment  now that people are back from their summer holidays.</p>
<h1><a id="id3" name="id3" />Issues</h1>
<p>Some of the things which will need to be resolved are:</p>
<ol class="lin" style="list-style: decimal;">
<li><b>
<p>Which OAI-PMH provider to use?</p>
<p></b>
<p>Metadata will get from the VITAL repository to Research Data Australia via an OAI-PMH feed, but there are a few open source toolkits to choose from. We will need to support at least one, maybe more. As ANDS staffer Xiaobin Shen reminded me in the comments of my last post, one consideration will be whether or not the provider supports deletions, not all do. This will require careful testing before we commit to one provider or another.</p>
</li>
<li>
<p><b>What data model will be used?</b> </p>
<p>At the moment, all the VITAL repositories in Australia that I know about have a very simple data model. Each item in the repository has a &#8216;master&#8217; metadata record usually in MARCXML, sometimes MODS (they&#8217;re effectively the same) with derived metadata in Dublin Core. There may also be datastreams, usually PDF files. At this stage there is no formal content model, so the datastreams could be called anything and it&#8217;s up to humans to make sense of them; there&#8217;s no guaranteed way to tell whether a PDF is an abstract, or the whole record, a preprint or a published version of a paper, for example.</p>
<p>It&#8217;s a bit hard to get information about VITAL unless you&#8217;re a customer; the <a href="http://www.vtls.com/products/vital">product page</a> is currently sporting a copyright statement from 2008 and the brochure (<a href="http://www.vtls.com/media/en-US/brochures/vtls_vital.pdf">PDF</a>) is big on tropical fish and short on specifications so what I report here may need correcting. </p>
<p>The version of VITAL which I think most sites are running in Australia is 3.x. It uses Fedora 2 which doesn&#8217;t have formal content models. Fedora 3 has a formal mechanism for describing content models which means that you could describe the parts of an object, and their role in the object. From what I can gather VITAL 4 which I saw demonstrated in late 2008, and was released in March 2009 has content models too, but  they are more about how to display an object than describing the relations between its parts. Perhaps someone could elaborate or correct this in the comments?</p>
<p>My working assumption is that for this development, the idea will be to stick with the way VITAL 3.x works, without worrying about content modelling which is fine here because this is not about complex  objects with lots of data, it&#8217;s about metadata about data collections where the collections themselves will usually reside elsewhere, which brings us to the sub-question.</p>
<p><b>
<p>What goes in the repository and what is stored elsewhere?</p>
<p></b>
<p>There&#8217;s a real chicken and egg problem here. I gather that eventually the NLA will be running a party-identifier service based on People Australia, so when that&#8217;s established we won&#8217;t be typing names into metadata forms any more, we&#8217;ll be linking to an ID. So in the abstract model behind RIF-CS the party management just goes elsewhere. I wonder if the same could happen with activities (every project has some kind of web site now, so why not point to an RDF endpoint hosted on the project website or just to the project web site as an identifier) and services (not sure what to do about these, but then I&#8217;m not really sure yet what services are in this context).</p>
<p>Question is, if we want to get a system running now, what&#8217;s the best way to identify parties in a future-proof way? I discussed a related issue in a <a href="http://caulcairss.wordpress.com/2009/07/21/nicnames-and-people-australia-some-thoughts-for-cairss/">blog post for CAIRSS about NicNames and People Australia</a>. Maybe NicNames can play a role here in the short term?</p>
<p>One of the design patterns I mentioned in that post, using an index to associate names in a repository with an identity service like NicNames via an index is expanded in a paper I wrote for the New Review of Information Networking. At the moment I can only link to <a href="http://www.informaworld.com/smpp/section?content=a917251410&amp;fulltext=713240928">this version</a> which is not open as the publisher has not responded to my questions about the OA policy so I have yet to deposit a version in Eprints.</p>
</li>
<li><b>
<p>Which metadata format to use?</p>
<p></b>
<p>Scott Yeadon made it clear in the comments to my last post that RIF-CS was designed as an interchange format only and that it is not yet stable, which sounds like good reasons not to use it as a storage format. but I have confirmed reports that others in ANDS are thinking otherwise, and are encouraging IR managers to put RIF-CS in the repository; I&#8217;d like to hear their side of the story too. Stability aside, if RIF-CS has what it takes to describe a collection of data then it might be an OK storage format.</p>
<p>I&#8217;m not aware of all the alternatives but one that I have heard mentioned by and ANDS person is the <a href="http://dublincore.org/groups/collections/collection-application-profile/">Dublin Core Collections Application Profile</a>. What else is out there in use for describing data collections and are there other data-collections registries harvesting from those descriptions in the rest of the world?</p>
<p>So this is an open issue for now; I hope we can get some consensus on a good data model for storing metadata about data collections (and the other entities). </p>
</li>
<li><b>
<p>What configuration is needed in VITAL? I think we need the following:</p>
<p></b>
<ul class="lib">
<li>
<p>Display for items (are we going to have parties and services and actions as items as well?)</p>
</li>
<li>
<p>Index configuration for VITAL&#8217;s Solr Index.</p>
</li>
</ul>
<p>And If the data collection resides in the repository should there be a collection object <b>and </b>a collection-description object or just one object with both collection and description?</p>
</li>
<li><b>
<p>What kinds of APIs do we need?</p>
<p></b>
<p>VALET can be used to integrate with other systems via XML  files which are deposited in a directory and picked-up by the ingest workflow of VALET, so they can be curated by data librarians, a technique which I think was developed by Simon McMillan at UNE in the RUBRIC days. We can certainly implement this, but should we have a web (or other) API for a system such as a grants database to add a new item as well? Should it be AtomPub or a simple post, or SWORD or something else?</p>
</li>
</ol>
<p class="center">Copyright Peter Sefton, 2010. Licensed under Creative Commons Attribution-Share Alike 2.5 Australia. &lt;<a href="http://creativecommons.org/licenses/by-sa/2.5/au/">http://creativecommons.org/licenses/by-sa/2.5/au/</a>&gt; </p>
<p class="center"><a href="http://creativecommons.org/licenses/by-sa/2.5/au/" name="HTTP:::DBPEDIA.ORG:SNORQL:?QUERY=SELECT+%3FRESOURCE%0D%0AWHERE+{+%0D%0A%3FRESOURCE+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%2FBIRTHPLACE%3E+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FRESOURCE%2FSYDNEY%3E+%3B%0D%0A%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%"><img alt="HTTP://DBPEDIA.ORG/SNORQL/?QUERY=SELECT+%3FRESOURCE%0D%0AWHERE+{+%0D%0A%3FRESOURCE+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%2FBIRTHPLACE%3E+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FRESOURCE%2FSYDNEY%3E+%3B%0D%0A%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%" class="fr1" height="31" src="http://ptsefton.com/wp-content/uploads/2010/02/m40ca94ba.png" style="border:0px; vertical-align: top" width="88" /></a></p>
<p class="center">This post was written in OpenOffice.org, using templates and tools provided by the <a href="http://ice.usq.edu.au/">Integrated Content Environment</a> project and published to WordPress using <a href="http://fascinator.usq.edu.au/desktop/desktop.htm">The Fascinator</a>.</p>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://ptsefton.com/2010/02/02/ands-metadata-store-starting-point.htm/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>New Project: Metadata for data collections</title>
		<link>http://ptsefton.com/2010/01/15/new-project-metadata-for-data-collections.htm</link>
		<comments>http://ptsefton.com/2010/01/15/new-project-metadata-for-data-collections.htm#comments</comments>
		<pubDate>Thu, 14 Jan 2010 21:37:40 +0000</pubDate>
		<dc:creator>ptsefton</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://ptsefton.com/2010/01/15/new-project-metadata-for-data-collections.htm</guid>
		<description><![CDATA[



Scope &#38; Deliverables
Issues for a new standalone metadata store
Issues with adapting an IR



[This is a re-post. I put this up a couple of days ago but my hosting provider lost the server for a day and had to restore from backup.]
At the Australian Digital Futures Institute we&#8217;ve landed a contract with ANDS (the Australian National [...]]]></description>
			<content:encoded><![CDATA[<abbr class="unapi-id" title="http://ptsefton.com/2010/01/15/new-project-metadata-for-data-collections.htm"><!-- &nbsp; --></abbr>
<div>
<div class="page-toc">
<ul>
<li><a href="#id2">Scope &amp; Deliverables</a></li>
<li><a href="#id3">Issues for a new standalone metadata store</a></li>
<li><a href="#id4">Issues with adapting an IR</a></li>
</ul>
</div>
<div>
<p>[This is a re-post. I put this up a couple of days ago but my hosting provider lost the server for a day and had to restore from backup.]</p>
<p>At the Australian Digital Futures Institute we&#8217;ve landed a contract with <a href="http://ands.org.au/">ANDS</a> (the Australian National Data Service)  to write some software specifications for key bits of infrastructure for the data commons. I&#8217;m going to be doing most of the initial work on this one, and I&#8217;ll use my blog and Twitter to communicate what I&#8217;m up to as I go, as per the communication plan for the project. I know that some of my regular readers are interested in this stuff.; please comment if you have anything to add. </p>
<p>This first post will look at the scope of the consultancy, and  talk about some of the major issues that I think we&#8217;ll need to resolve.  </p>
<h1><a id="id2" name="id2" />Scope &amp; Deliverables</h1>
<p>Our contract is to p<b>rovide specifications for software applications to help institutions manage metadata about data collections and project plans to build them</b>. There are two models for these applications. The first is a stand-alone system, while the second is using the existing institutional repository to store metadata <span class="spCh spChx2013">&#8211;</span> here&#8217;s what the project plan says:</p>
<blockquote class="bq"><p><a name="h5ea3d4bcp1" />The proposed [agreed now] process is for ADFI staff to prepare at least one scoped and costed software specification, with an agreed development methodology for a:</p>
<blockquote class="bq"><p><a name="h7af33813p1" />[<span class="spCh spChx2026">&#8230;</span>] standalone metadata store that can be used to augment either an existing institutional repository or data store, and which will manage metadata about data objects and data collections. This metadata will need to complement the existing object-level metadata. The interface should be designed to be web-based and easy to use by repository managers without specific technical skills. The metadata store should be able to generate collection descriptions as RIF-CS (http://www.globalregistries.org/rifcs.html) and make these available for harvesting using OAI-PMH and/or direct harvest of XML. </p>
<p>Source: ANDS-internal document. </p>
</blockquote>
<p><a name="h1ec12d8fp1" />Additionally we will develop one or more specifications for providing metadata capture solutions based on existing institutional repository software.</p>
</blockquote>
<p>So the main thing we&#8217;ll be looking at is a kind of repository or registry-like thing where institutions can describe their data collections, and publish those descriptions to <a href="http://services.ands.org.au/home/orca/rda/">Research Data Australia</a>. If you visit that site you can see the data model that&#8217;s being used by ANDS with four classes of thing. Here&#8217;s <a href="http://services.ands.org.au/home/orca/rda/">a snapshot of how it looks today</a>:</p>
<blockquote class="bq"><p>Collections (877)</p>
<p>Where a collection is a useful grouping of physical or digital items.</p>
<p>Parties (260)</p>
<p>Where a party is a person or organisation that has some relationship to a collection, service, activity, or party.</p>
<p>Services (1)</p>
<p>Where a service is a mechanism for gaining some kind of access to or information about a collection (or items within a collection).</p>
<p>Activities (2)</p>
<p>Where an activity is an undertaking or process related to the creation, update, or maintenance of a collection. [I think projects are activities - PS]</p>
</blockquote>
<p>This model, which is expressed in a schema known as <a href="http://ands.org.au/resource/rif-cs.html">RIF-CS</a>, is based on a yet-to-be-approved ISO Standard (ISO2146). RIF-CS implements quite a different approach from the way most Institutional Repositories (IRs) are structured, where the repository items themselves are the only primary objects. </p>
<p>In an IR you typically have metadata which <b>refers to parties and services and so on using text-strings</b>. Parties, services and activities <b>are not things</b> in the repository (speaking in general here, there are some exceptions, and increasing people have some status as objects in repositories like Fez). A publisher, which is a party, will be described in an XML document conforming to some metadata schema (in Australia this is typically Dublin Core, MODS or MARC XML)<span style="font-weight:normal; "><span class="T5"> using its name expressed as a string</span></span>. If two records refer to the same party using different strings then things start to get messy. And as we all know, there is one kind of party around which strings proliferate and cause confusion: people. There is an ANDS effort under way to provide services for describing people outside of a repository so you can refer to them using some kind of ID, but that service is a way off still.</p>
<p><a href="http://www.globalregistries.org/rifcs.html">RIF-CS is described here</a> as an interchange format (it is, after all the The Registry Interchange Format &#8211; Collections and Services) The which I gather was originally cooked up to support the <a href="http://www.globalregistries.org/">Global Registries Initiative</a>. But I think some of the people are thinking of using it as a storage format within repositories. If we decide do that we will have to do so carefully. For example, you would not want to store this <a href="http://services.ands.org.au/documentation/rifcs/example/rif.xml">example RIF-CS</a> as a single repository object, trust me, as the file contains a number of things that really should be treated as discrete objects in a repository, including metadata about people and projects.</p>
<h1><a id="id3" name="id3" />Issues for a new standalone metadata store</h1>
<p>If the goal for the standalone metadata store is to support the abstract model behind RIF-CS then one of the challenges of this project will be working out how to support this kind of a model. How do make sure that parties and so on are described once, and as accurately as possible? And where that fails, make sure there is infrastructure to assert that two differently described parties are the same, and two identically described parties are in fact different. If you want to refer to a party that is not described yet how will we support that? Will people and objects and so on be primary objects?</p>
<p>In discussions so far, we have tentative agreement between our ANDS stakeholders and ADFI and on one key point; as far as possible we&#8217;d like the stand-alone metadata store to be Linked (Open) Data ready. This means that all Collections, Parties, Services and Activities would have URIs, and the metadata store would allow data entry that uses the URIs behind the scenes. (I checked with Scott Yeadon at ANDS and he tells me that RIF-CS &#8216;keys&#8217; can be URIs and are expected to be globally unique, even though this is not entirely clear from the schema documentation). This is a user interface challenge, but if we can pull it off we should be able to avoid stuff like this, where a mistyped string results in two parties where there should be one:</p>
<blockquote class="bq"><p><a href="http://services.ands.org.au/home/orca/rda/list.php?group=&amp;class=Party&amp;page=2">http://services.ands.org.au/home/orca/rda/list.php?group=&amp;class=Party&amp;page=2</a>
<ul class="lib">
<li>
<p><span style="color:#cb6811; font-size:9.75pt; font-style:normal; font-variant:normal; font-weight:normal; letter-spacing:normal; text-decoration: underline; text-transform:none; "><span class="T1">Australian Institute od Marine Science</span></span><span style="color:#283f09; font-size:9.75pt; font-style:normal; font-variant:normal; font-weight:normal; letter-spacing:normal; text-transform:none; "><span class="T2">&#160;</span></span></p>
</li>
<li>
<p><span style="color:#688113; font-size:9.75pt; font-style:normal; font-variant:normal; font-weight:normal; letter-spacing:normal; text-transform:none; "><span class="T3">Australian Institute of Marine Science</span></span><span style="color:#283f09; font-size:9.75pt; font-style:normal; font-variant:normal; font-weight:normal; letter-spacing:normal; text-transform:none; "><span class="T2">&#160;</span></span></p>
</li>
</ul>
</blockquote>
<p>So this is all pointing to a repository which uses RDF to describe things, drawing on appropriate vocabularies for relations and values, such as <a href="http://www.foaf-project.org/">FOA</a>F for parties, the <a href="http://bibliontology.com/">bibliogrpahic ontology</a> for documents, and  maybe the <a href="http://dublincore.org/groups/collections/collection-application-profile/">Dublin Core Collections</a> for data collections . We could, of course build a classic database-backed system with a database schema that reflect the abstract model directly, but that&#8217;s very inflexible and not easily extensible.</p>
<p>(We don&#8217;t have a lot of experience with large-scale RDF at ADFI, but I am thinking that in addition to the RDF, and a triple store to keep it in you would probably still have a high performance index of the repository using Apache Solr to provide the search/browser interface.)</p>
<h1><a id="id4" name="id4" />Issues with adapting an IR</h1>
<p>If we work on storing data collection metadata in IRs then there will be additional challenges. All institutional repositories in Australia already support Dublin Core metadata, at least in their OAI-PMH feeds. But a lot of repository software in use in Australia is limited to storing metadata as plain strings. Because of this, most of the repository people are used to thinking in terms of flat metadata models serialised in XML, not networks of relations, RDF style. Anything we do to augment IR software will have to fit in with the way IRs are used now, mainly for document content. There are lots of design challenges there for the information model and user interface. We have started thinking about this in detail for a particular repository platform at a particular uni, more on that soon.</p>
<p class="center">Copyright Peter Sefton, 2010. Licensed under Creative Commons Attribution-Share Alike 2.5 Australia. &lt;<a href="http://creativecommons.org/licenses/by-sa/2.5/au/">http://creativecommons.org/licenses/by-sa/2.5/au/</a>&gt; </p>
<p class="center"><a href="http://creativecommons.org/licenses/by-sa/2.5/au/" name="HTTP:::DBPEDIA.ORG:SNORQL:?QUERY=SELECT+%3FRESOURCE%0D%0AWHERE+{+%0D%0A%3FRESOURCE+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%2FBIRTHPLACE%3E+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FRESOURCE%2FSYDNEY%3E+%3B%0D%0A%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%"><img alt="HTTP://DBPEDIA.ORG/SNORQL/?QUERY=SELECT+%3FRESOURCE%0D%0AWHERE+{+%0D%0A%3FRESOURCE+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%2FBIRTHPLACE%3E+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FRESOURCE%2FSYDNEY%3E+%3B%0D%0A%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%" class="fr1" height="31" src="http://ptsefton.com/wp-content/uploads/2010/01/m40ca94ba.png" style="border:0px; vertical-align: top" width="88" /></a></p>
<p class="center">This post was written in OpenOffice.org, using templates and tools provided by the <a href="http://ice.usq.edu.au/">Integrated Content Environment</a> project and published to WordPress using <a href="http://fascinator.usq.edu.au/desktop/desktop.htm">The Fascinator</a>.</p>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://ptsefton.com/2010/01/15/new-project-metadata-for-data-collections.htm/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Bye Bye Word 2007 Custom XML ?</title>
		<link>http://ptsefton.com/2009/12/23/bye-bye-word-2007-custom-xml.htm</link>
		<comments>http://ptsefton.com/2009/12/23/bye-bye-word-2007-custom-xml.htm#comments</comments>
		<pubDate>Wed, 23 Dec 2009 00:42:29 +0000</pubDate>
		<dc:creator>ptsefton</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://ptsefton.com/2009/12/23/bye-bye-word-2007-custom-xml.htm</guid>
		<description><![CDATA[



I have argued here repeatedly that building applications using Microsoft Word&#8217;s Custom XML feature is a spectacularly bad idea, but it turns out that I missed the best reason of all not to use custom XML: It may well be going away very soon, because, I gather, it infringed someone&#8217;s patent. In reaction to this [...]]]></description>
			<content:encoded><![CDATA[<abbr class="unapi-id" title="http://ptsefton.com/2009/12/23/bye-bye-word-2007-custom-xml.htm"><!-- &nbsp; --></abbr>
<div>
<div class="page-toc" />
<div>
<p>I have argued here repeatedly that building applications using Microsoft Word&#8217;s Custom XML feature is a spectacularly bad idea, but it turns out that I missed the best reason of all not to use custom XML: <b>It may well be going away very soon</b>, because, I gather, <a href="http://blogs.zdnet.com/microsoft/?p=4835">it infringed someone&#8217;s patent</a>. In reaction to this news, <a href="http://www.tbray.org/ongoing/When/200x/2009/12/22/On-Custom-XML">Tim Bray is reiterating his case that rolling your own XML is a bad idea</a>:</p>
<blockquote class="bq"><p>People like me, who had experience with the extreme difficulty of doing this kind of customization, the extremely limited number of places where it made sense, and the high proportion of failure among people who tried to do it, shouted <span class="spCh spChx201c">&#8220;</span>That<span class="spCh spChx2019">&#8217;</span>s a&#160;bug!<span class="spCh spChx201d">&#8221;</span> Given that the number of organizations that deploy Office is huge, I bet Microsoft can trot out a few customers who<span class="spCh spChx2019">&#8217;</span>ve got good results with Custom XML. But I also bet that, first of all, the proportion who try is tiny and, second, that among those who do, few succeed in getting much business value.</p>
</blockquote>
<p>I&#8217;m one of these people like Tim and like him, I told you so. Let&#8217;s revisit some of this:</p>
<ul class="lib">
<li>
<p><a href="http://ptsefton.com/2009/03/16/opening-up-microsoft.htm">Earlier this year I agreed with Glyn Moody that using custom XML</a> in a collaboration between Science Commons and Microsoft was promoting lock-in to Word 2007 and later versions. If the feature is removed from future versions then would MS Research like to consider the <a href="http://ptsefton.com/2009/03/27/more-on-microsoft-collaboration-and-word-processor-interop.htm">suggestion I made in Marc</a>h to embed ontological annotations using links. The ontology plugin in question used the simplest of schemas, so simple that there was really no need for custom XML at all.</p>
</li>
<li>
<p>I have also expressed doubts about the usefulness of a more complicated plugin that also comes out of Microsoft Research; <a href="http://www.microsoft.com/downloads/details.aspx?familyid=09c55527-0759-4d6d-ae02-51e90131997e">Article Authoring Add-in for Microsoft Office Word 2007</a>. This was supposed to let you author documents that were complaint with the complex NLM document Schema using Word. Again, it would have worked to lock documents to Word 2007 but I also found it to be pretty well unusable, and having seen a few of these classes of application over the years I didn&#8217;t give it much chance of survival in the wild. Now, I guess its future is in doubt, and I&#8217;d still love to find the resources to try NLM authoring using the ICE system or a similar style-based system. I&#8217;m assuming that MS won&#8217;t be under an injunction to drop styles support any time soon, although they have<a href="http://ptsefton.com/blog/2006/12/01/dont-bury-styles/"> done their best to bury them under the new user interface and make it less likely that people will use them</a>.</p>
</li>
<li>
<p>I counselled my colleagues at Cambridge working on the <a href="http://research.microsoft.com/en-us/projects/chem4word/">Chem4Word </a>plugin to be cautious, citing lock-in, usability and maintainability. I wonder if they have heard anything about what might happen to this tool now? (I still think the best approach would be to use OLE to embed the chemical editor and simple features like links, fields and styles for the rest, which would mean that the tool could interoperate between different word processors including older versions of Word).</p>
</li>
</ul>
<p>So, if Microsoft is going to pull Custom XML out of Word, at least in the USA, then I was right, it was a trap, it just turned out to have even nastier teeth than I thought. Now, I&#8217;d be really happy to have our group help any creators open source academic word processor plugins that  have been snared. We&#8217;re happy to share our experience on how to build interoperable, simple, robust authoring tools.</p>
<p />
<p class="center">Copyright Peter Sefton, 2009. Licensed under Creative Commons Attribution-Share Alike 2.5 Australia. &lt;<a href="http://creativecommons.org/licenses/by-sa/2.5/au/">http://creativecommons.org/licenses/by-sa/2.5/au/</a>&gt; </p>
<p class="center"><a href="http://creativecommons.org/licenses/by-sa/2.5/au/" name="HTTP:::DBPEDIA.ORG:SNORQL:?QUERY=SELECT+%3FRESOURCE%0D%0AWHERE+{+%0D%0A%3FRESOURCE+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%2FBIRTHPLACE%3E+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FRESOURCE%2FSYDNEY%3E+%3B%0D%0A%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%"><img alt="HTTP://DBPEDIA.ORG/SNORQL/?QUERY=SELECT+%3FRESOURCE%0D%0AWHERE+{+%0D%0A%3FRESOURCE+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%2FBIRTHPLACE%3E+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FRESOURCE%2FSYDNEY%3E+%3B%0D%0A%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%" class="fr1" height="31" src="http://ptsefton.com/wp-content/uploads/2009/12/m40ca94ba.png" style="border:0px; vertical-align: top" width="88" /></a></p>
<p class="center">This post was written in OpenOffice.org, using templates and tools provided by the <a href="http://ice.usq.edu.au/">Integrated Content Environment</a> project and published to WordPress using <a href="http://fascinator.usq.edu.au/desktop/desktop.htm">The Fascinator</a>.</p>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://ptsefton.com/2009/12/23/bye-bye-word-2007-custom-xml.htm/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>DRFIC, Tokyo</title>
		<link>http://ptsefton.com/2009/12/09/drfic-tokyo.htm</link>
		<comments>http://ptsefton.com/2009/12/09/drfic-tokyo.htm#comments</comments>
		<pubDate>Wed, 09 Dec 2009 06:35:20 +0000</pubDate>
		<dc:creator>ptsefton</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://ptsefton.com/2009/12/09/drfic-tokyo.htm</guid>
		<description><![CDATA[


Suggestions for collaboration based on hard-won experience


On Tuesday last week I took to the stage at the University of the Sunshine Coast Innovation Centre for the CAIRSS community day, I talked about seven things we&#8217;re doing wrong in the world of repositories. Negative? Maybe, but there&#8217;s no point in spending all our time patting ourselves [...]]]></description>
			<content:encoded><![CDATA[<abbr class="unapi-id" title="http://ptsefton.com/2009/12/09/drfic-tokyo.htm"><!-- &nbsp; --></abbr>
<div>
<div>
<h2>Suggestions for collaboration based on hard-won experience</h2>
<div class="body">
<div>
<p>On Tuesday last week I took to the stage at the University of the Sunshine Coast Innovation Centre for the <a href="http://cairss.caul.edu.au/">CAIRSS</a> community day, I talked about seven things we&#8217;re doing wrong in the world of repositories. Negative? Maybe, but there&#8217;s no point in spending all our time patting ourselves on the back; I thought I might spark some discussion, and it seemed to work. My talk was in the session on software, so most of what I had to say was to do with technology, but some of it was policy and governance. That presentation will be posted on the CAIRSS site soon with commentary on the CAIRSS blog.</p>
<p>By Friday I was in Tokyo, at the Tokyo Institute of Technology in a building rejoicing in the name <span class="spCh spChx201c">&#8220;</span>Tokyo Tech Front<span class="spCh spChx201d">&#8221;</span>, another nice new venue. This time I was presenting a similar but more positive and less technically oriented list that I put together with Kate Watson, my CAIRSS colleague. The theme of the conference was <a href="http://drfic2009.jp/index_en.htm">Open Access now and in the future, from the global and Asia-Pacific points of view</a> so we put together a summary of where we think Australian university Institutional Repositories are up to from an on-the-ground perspective, with some suggestions for future collaboration based on our own experience of four years or so of IR development in Australia.</p>
<p>There was a lot of emphasis on the Open Access movement at this conference, along with discussion of how to recruit repository deposits. I think the first lot of Australian IRs were mostly motivated by Open Access, but Katie and I note that while this is still a concern, the biggest driver for an Australian university to have a repository is the good old ERA: <a href="http://www.arc.gov.au/era/default.htm">The Excellence in Research for Australia Initiative</a>. Open Access might get you cited, extend the reach of your research and chip away at rationalising the scholarly communications process, but the ERA will get government cash. I think that the ERA is having a galvanising effect on populating repositories in Australia, at least with metadata and &#8216;dark&#8217; publisher&#8217;s versions of articles, maybe at the cost of openness, for now.</p>
<p>I won&#8217;t try to summarise <a href="http://drfic2009.jp/program_.htm">everything on the program</a>, but Alicia L<span class="spCh spChxf3">&#243;</span>pez Medina&#160; of COAR/UNED had a strong thread about OA running through her talk, as did Salvatore Mele from CERN, which is unsurprising given that those physicists are very OA, using <a href="http://arxiv.org/">ArXiv</a> as they do. Salvatore also talked about <a href="http://scoap3.org/">SCOAP<sup><sup>3</sup></sup></a>. As the site says a <span class="spCh spChx201c">&#8220;</span>A consortium facilitates Open Access publishing in High Energy Physics by re-directing subscription money.<span class="spCh spChx201d">&#8221;</span> (Alicia pointed out that the percentage of literature covered by SCOAP3 is very small so we should not relax our efforts to recruit deposits just yet).</p>
<p>On the recruitment side, David Shulenburger (Association of Public and Land-grant Universities) talked a lot about repository deposit mandates. The chair, Syun Tutiya asked me to comment on the state of repository mandates in Australia, and I said that I think that the whole open access thing has gone off the boil a bit in Australia, probably because of the ERA, especially since some of the requirements for the ERA are the opposite of OA, people have been figuring out how to limit, rather than promote access to some content. But I couldn&#8217;t help noting that we have mandatory income tax in Australia, and some people go to great lengths to avoid it, (that&#8217;s why I think we need to re-brand it as &#8217;sharing&#8217; which we are all taught from an early age is a good thing, an idea that resonates with the success of services like <a href="http://www.slideshare.net/">slideshare</a> in academia).</p>
<p>This was my first time presenting with a translator, so I was glad that the presentation was relatively simple; I took it one point at a time and paused for the Japanese version. The talk following mine, by Hideki Uchijima (Kanazawa University) summarised the state of play in Japan; I hope the presentation turns up online, as I have lost my printed proceedings. From memory about a quarter of circa 800 (!) universities have repositories, although that sounds better if you look at unis that award PhDs; and while there are metadata standards to which most repositories adhere, I gather that the Japanese experience of harvesting multiple repositories is like ours in Australia &#8217;some normalisation required&#8217; via &#8216;crosswalks&#8217;.</p>
<p>I was interested to see Susan Gibbons presenting on the outcome of University of Rochester&#8217;s approach of <i>Studying Users to Design a Better Repository</i>, resulting in their new IR+ software, written by Nathan Sarr, which Tim McCallum is installing in our CAIRSS sandbox so Australian IR people can kick the tyres.</p>
<p>Mikiko Tanifuji of the National Institute for Materials Science pointed out to me in a break that the situation in Japan is like Australia in that <b>we</b> have CAIRSS which is for universities, and <b>they</b> have the Digital Repository Federation which is likewise focussed, and wouldn&#8217;t it be nice if we could be a bit more inclusive. Sounds like a good idea to me. Anybody care to fund it?</p>
<p>My 3 votes for <a href="http://drfic2009.jp/program_en1.htm">the posters</a> went to: 16, 17 and 10.</p>
<p>The slides follow below, thanks to a new feature in The Fascinator that allows blog posts (and soon repository deposits) to be composed of more than one digital item.</p>
<p class="center">Copyright Peter Sefton, 2009. Licensed under Creative Commons Attribution-Share Alike 2.5 Australia. &lt;<a href="http://creativecommons.org/licenses/by-sa/2.5/au/">http://creativecommons.org/licenses/by-sa/2.5/au/</a>&gt;</p>
<p class="center"><a href="http://creativecommons.org/licenses/by-sa/2.5/au/" name="HTTP:::DBPEDIA.ORG:SNORQL:?QUERY=SELECT+%3FRESOURCE%0D%0AWHERE+{+%0D%0A%3FRESOURCE+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%2FBIRTHPLACE%3E+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FRESOURCE%2FSYDNEY%3E+%3B%0D%0A%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%"><!-- --><img alt="HTTP://DBPEDIA.ORG/SNORQL/?QUERY=SELECT+%3FRESOURCE%0D%0AWHERE+{+%0D%0A%3FRESOURCE+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%2FBIRTHPLACE%3E+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FRESOURCE%2FSYDNEY%3E+%3B%0D%0A%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%" class="fr1" height="31" src="http://ptsefton.com/wp-content/uploads/2009/12/drfic1_filesm40ca94ba.png.png" style="border:0px; vertical-align: top" width="88"><br />
</a></p>
<p class="center">This post was written in OpenOffice.org, using templates and tools provided by the <a href="http://ice.usq.edu.au/">Integrated Content Environment</a> project and published to WordPress using <a href="http://fascinator.usq.edu.au/desktop/desktop.htm">The Fascinator</a>.</p>
</div>
</div>
</div>
<div>
<h2>Slides</h2>
<div class="body">
<div>
<div class='slide'>
<h1>Page 1</h1>
<p><img name='pt-japan_ibbr815_files/pt-japan_ibbr81500.jpg' src='http://ptsefton.com/wp-content/uploads/2009/12/pt-japan_ibbr815_filespt-japan_ibbr81500.jpg.jpeg' width='552' height='413'>
</p>
</div>
<div class='slide'>
<h1>Page 2</h1>
<p><img name='pt-japan_ibbr815_files/pt-japan_ibbr81501.jpg' src='http://ptsefton.com/wp-content/uploads/2009/12/pt-japan_ibbr815_filespt-japan_ibbr81501.jpg.jpeg' width='552' height='413'>
</p>
</div>
<div class='slide'>
<h1>Page 3</h1>
<p><img name='pt-japan_ibbr815_files/pt-japan_ibbr81502.jpg' src='http://ptsefton.com/wp-content/uploads/2009/12/pt-japan_ibbr815_filespt-japan_ibbr81502.jpg.jpeg' width='552' height='413'>
</p>
</div>
<div class='slide'>
<h1>Page 4</h1>
<p><img name='pt-japan_ibbr815_files/pt-japan_ibbr81503.jpg' src='http://ptsefton.com/wp-content/uploads/2009/12/pt-japan_ibbr815_filespt-japan_ibbr81503.jpg.jpeg' width='552' height='413'>
</p>
</div>
<div class='slide'>
<h1>Page 5</h1>
<p><img name='pt-japan_ibbr815_files/pt-japan_ibbr81504.jpg' src='http://ptsefton.com/wp-content/uploads/2009/12/pt-japan_ibbr815_filespt-japan_ibbr81504.jpg.jpeg' width='552' height='413'>
</p>
</div>
<div class='slide'>
<h1>Page 6</h1>
<p><img name='pt-japan_ibbr815_files/pt-japan_ibbr81505.jpg' src='http://ptsefton.com/wp-content/uploads/2009/12/pt-japan_ibbr815_filespt-japan_ibbr81505.jpg.jpeg' width='552' height='413'>
</p>
</div>
<div class='slide'>
<h1>Page 7</h1>
<p><img name='pt-japan_ibbr815_files/pt-japan_ibbr81506.jpg' src='http://ptsefton.com/wp-content/uploads/2009/12/pt-japan_ibbr815_filespt-japan_ibbr81506.jpg.jpeg' width='552' height='413'>
</p>
</div>
<div class='slide'>
<h1>Page 8</h1>
<p><img name='pt-japan_ibbr815_files/pt-japan_ibbr81507.jpg' src='http://ptsefton.com/wp-content/uploads/2009/12/pt-japan_ibbr815_filespt-japan_ibbr81507.jpg.jpeg' width='552' height='413'>
</p>
</div>
<div class='slide'>
<h1>Page 9</h1>
<p><img name='pt-japan_ibbr815_files/pt-japan_ibbr81508.jpg' src='http://ptsefton.com/wp-content/uploads/2009/12/pt-japan_ibbr815_filespt-japan_ibbr81508.jpg.jpeg' width='552' height='413'>
</p>
</div>
<div class='slide'>
<h1>Page 10</h1>
<p><img name='pt-japan_ibbr815_files/pt-japan_ibbr81509.jpg' src='http://ptsefton.com/wp-content/uploads/2009/12/pt-japan_ibbr815_filespt-japan_ibbr81509.jpg.jpeg' width='552' height='413'>
</p>
</div>
<div class='slide'>
<h1>Page 11</h1>
<p><img name='pt-japan_ibbr815_files/pt-japan_ibbr81510.jpg' src='http://ptsefton.com/wp-content/uploads/2009/12/pt-japan_ibbr815_filespt-japan_ibbr81510.jpg.jpeg' width='552' height='413'>
</p>
</div>
<div class='slide'>
<h1>Page 12</h1>
<p><img name='pt-japan_ibbr815_files/pt-japan_ibbr81511.jpg' src='http://ptsefton.com/wp-content/uploads/2009/12/pt-japan_ibbr815_filespt-japan_ibbr81511.jpg.jpeg' width='552' height='413'>
</p>
</div>
<div class='slide'>
<h1>Page 13</h1>
<p><img name='pt-japan_ibbr815_files/pt-japan_ibbr81512.jpg' src='http://ptsefton.com/wp-content/uploads/2009/12/pt-japan_ibbr815_filespt-japan_ibbr81512.jpg.jpeg' width='552' height='413'>
</p>
</div>
<div class='slide'>
<h1>Page 14</h1>
<p><img name='pt-japan_ibbr815_files/pt-japan_ibbr81513.jpg' src='http://ptsefton.com/wp-content/uploads/2009/12/pt-japan_ibbr815_filespt-japan_ibbr81513.jpg.jpeg' width='552' height='413'>
</p>
</div>
<div class='slide'>
<h1>Page 15</h1>
<p><img name='pt-japan_ibbr815_files/pt-japan_ibbr81514.jpg' src='http://ptsefton.com/wp-content/uploads/2009/12/pt-japan_ibbr815_filespt-japan_ibbr81514.jpg.jpeg' width='552' height='413'>
</p>
</div>
<div class='slide'>
<h1>Page 16</h1>
<p><img name='pt-japan_ibbr815_files/pt-japan_ibbr81515.jpg' src='http://ptsefton.com/wp-content/uploads/2009/12/pt-japan_ibbr815_filespt-japan_ibbr81515.jpg.jpeg' width='552' height='413'>
</p>
</div>
<div class='slide'>
<h1>Page 17</h1>
<p><img name='pt-japan_ibbr815_files/pt-japan_ibbr81516.jpg' src='http://ptsefton.com/wp-content/uploads/2009/12/pt-japan_ibbr815_filespt-japan_ibbr81516.jpg.jpeg' width='552' height='413'>
</p>
</div>
<div class='slide'>
<h1>Page 18</h1>
<p><img name='pt-japan_ibbr815_files/pt-japan_ibbr81517.jpg' src='http://ptsefton.com/wp-content/uploads/2009/12/pt-japan_ibbr815_filespt-japan_ibbr81517.jpg.jpeg' width='552' height='413'>
</p>
</div>
<div class='slide'>
<h1>Page 19</h1>
<p><img name='pt-japan_ibbr815_files/pt-japan_ibbr81518.jpg' src='http://ptsefton.com/wp-content/uploads/2009/12/pt-japan_ibbr815_filespt-japan_ibbr81518.jpg.jpeg' width='552' height='413'>
</p>
</div>
<div class='slide'>
<h1>Page 20</h1>
<p><img name='pt-japan_ibbr815_files/pt-japan_ibbr81519.jpg' src='http://ptsefton.com/wp-content/uploads/2009/12/pt-japan_ibbr815_filespt-japan_ibbr81519.jpg.jpeg' width='552' height='413'>
</p>
</div>
<div class='slide'>
<h1>Page 21</h1>
<p><img name='pt-japan_ibbr815_files/pt-japan_ibbr81520.jpg' src='http://ptsefton.com/wp-content/uploads/2009/12/pt-japan_ibbr815_filespt-japan_ibbr81520.jpg.jpeg' width='552' height='413'>
</p>
</div>
</div>
</div>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://ptsefton.com/2009/12/09/drfic-tokyo.htm/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>ICE Week, background</title>
		<link>http://ptsefton.com/2009/11/25/ice-week-background.htm</link>
		<comments>http://ptsefton.com/2009/11/25/ice-week-background.htm#comments</comments>
		<pubDate>Wed, 25 Nov 2009 01:52:26 +0000</pubDate>
		<dc:creator>ptsefton</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://ptsefton.com/2009/11/25/ice-week-background.htm</guid>
		<description><![CDATA[
PDF version


History

FrameMaker
LaTeX
Lessons?
GOOD
CPS
ICE


Future directions?
Working parties
References




This week at the Australian Digital Futures Institute we&#8217;re taking a look at ICE: the Integrated Content Environment, an open source software system which is now part of the core infrastructure at USQ.
The ADFI team are spending all week exploring new stuff and considering what ICE 3 might look like and how [...]]]></description>
			<content:encoded><![CDATA[<abbr class="unapi-id" title="http://ptsefton.com/2009/11/25/ice-week-background.htm"><!-- &nbsp; --></abbr>
<div class="rendition-links"><span class="pdf-rendition-link"><a href="http://ptsefton.com/wp-content/uploads/2009/11/ICEWeek1.pdf.pdf" title="View the printable version of this page">PDF version</a></span></div>
<div class="page-toc">
<ul>
<li><a href="#id1">History</a>
<ul>
<li><a href="#id2">FrameMaker</a></li>
<li><a href="#id4">LaTeX</a></li>
<li><a href="#id5">Lessons?</a></li>
<li><a href="#id6">GOOD</a></li>
<li><a href="#id10">CPS</a></li>
<li><a href="#id13">ICE</a></li>
</ul>
</li>
<li><a href="#id17">Future directions?</a></li>
<li><a href="#id19">Working parties</a></li>
<li><a href="#id21">References</a></li>
</ul>
</div>
<div class="body">
<div>
<p>This week at the <a href="www.usq.edu.au/adfi">Australian Digital Futures Institute</a> we&#8217;re taking a look at ICE: the <a href="ice.usq.edu.au/">Integrated Content Environment</a>, an open source software system which is now part of the core infrastructure at USQ.</p>
<p>The ADFI team are spending all week exploring new stuff and considering what ICE 3 might look like and how it might fit with or merge with our other project, <a href="fascinator.usq.edu.au/desktop/desktop.htm">The Fascinator</a>.</p>
<p>The week kicked off with a couple of sessions with the USQ community. The first session was by invitation, we asked a number of USQ staff who have engaged with ICE over the last few years along, including some who have been vocal critics. The second session was an open-invitation ADFI event. In both cases we went through this agenda:</p>
<ul class="lib">
<li>
<p>How did we get here? (and what did we learn?)</p>
</li>
<li>
<p>Your concerns</p>
</li>
<li>
<p>What should we do next?</p>
</li>
</ul>
<p>Bron Chandler has <a href="ice.usq.edu.au/blog/2009/11/23/ice-the-future.html">summarised some of the points already</a>, It&#8217;s half way through the week now and I have been jotting down my thoughts in this post since Monday. This is a bit long winded, but I hope it is a useful record of some of the lessons learned from the last few years of content-creation systems at USQ. It&#8217;s on my blog, so it&#8217;s my opinion, comments are welcome to expand, clarify, correct or argue.</p>
<p>Remember that USQ is a distance education-specialist and produces lots of courseware, book-length study materials that are delivered in print/PDF and/or HTML to thousands of students. Teaching and learning is not all about courseware, of course, but courseware is a big part of what we do, and that&#8217;s what ICE was developed for.</p>
<h1><a id="id1" name="id1"><!--id1--></a>History</h1>
<h2><a id="id2" name="id2"><!--id2--></a>FrameMaker</h2>
<p>Adobe FrameMaker has been used since the 1990s at USQ to produce courseware for print. It&#8217;s still used for some print-only courses (largely ones with lots of maths I think). The problem with our use of FrameMaker is all the manual steps involved. Authors submit manuscripts in Word, these are converted to Frame by operators, largely by hand, and then proofing is done by the academics in Word with track changes, or on paper with corrections. Staff in Electronic Publishing Services then have to update the Frame documents change-by-change. This legacy process is being phased out, but these things take time.</p>
<p>USQ has never had a fully automated web-conversion process for FrameMaker documents although there have been a couple of attempts, and there is now a process for converting to ICE in a semi-automated way. In a decidedly un-automated process from 1999 or so until 2003-ish USQ used to send FrameMaker documents over to NextEd at the USQ Toowoomba campus to have courses converted to HTML by hand. This was a process which I recall costing circa $5000 US per course payable every semester, because of the almost complete lack of automation they had to be redone with each offering. Apparently some of the HTML courses from the NextEd days are still extant and being maintained using DreamWeaver and the like.</p>
<p>Bron Chandler and Ron Ward and I from the current ADFI team all worked for NextEd, and we still go to the support-group meetings from time to time but I think we&#8217;re getting better.</p>
<p>At NextEd I devised a word-processor based publishing system built around FrameMaker which would have slashed the cost of producing courses for both print and HTML at USQ. The system used Frame&#8217;s structured editing mode using XHTML as a document model, so you could produce high quality print documents and export them straight to the web <span class="spCh spChx2013">&#8211;</span> and because it used styles in FrameMaker you could import word processing documents and have FrameMaker automatically add structure. NextEd got a pilot system up in a few weeks, with help from <a href="allette.com.au/">Allette Systems</a>. I thought it was rather a good idea but USQ was not interested in that, or our other publishing system the CPS, because they had other plans, in the form of an XML system.</p>
<h2><b><a id="id3" name="id3"><!--id3--></a>Lessons learned?</b></h2>
<p>I think it was pretty broadly recognised that all the manual work involved in producing courseware this way was not sustainable and not sensible which is why USQ tried to build an XML publishing system. The main lesson here is that systems change can take a long time in a university and document conversion is always slow and expensive, which is why we still have some FrameMaker documents.</p>
<h2><a id="id4" name="id4"><!--id4--></a>LaTeX</h2>
<p>Before looking at the XML world, though, there&#8217;s another legacy system, LaTeX.</p>
<ul class="lib">
<li>
<p>Used in some technical disciplines.</p>
</li>
<li>
<p>No universal web conversion (really, there isn&#8217;t).</p>
</li>
</ul>
<p>There&#8217;s not a whole lot to say about this, the people who know how to use it continue to do so to produce courseware, and it continues to make no inroads into other disciplines, I hear, though, that in engineering they are starting to get new staff who don&#8217;t know LaTeX. The problem is that Word and OpenOffice.org are not great when it comes to very large amounts of Maths, and ICE uses OpenOffice.org to make PDF, which is a bit buggy for maths so there is still a place for LaTeX, even if it is embedding LaTeX in word processing documents and automatically rendering the maths at least until further investments are made in ICE to improve its MathML support. USQ needs to work this out.</p>
<p>Participants in the sessions this week emphasised some of the good points of LaTeX:</p>
<ul class="lib">
<li>
<p>BibTex referencing is vastly superior to anything else, we heard. (I&#8217;m not an expert but I believe (via Bruce Darcus) if this is true it is only so for some scientific disciplines. ICE works with the university supported EndNote and with the open source Zotero).</p>
</li>
<li>
<p>LaTex&#8217;s rendering is better than we produce using ICE/FrameMaker. (Certainly true for maths, probably not that important for most of the rest of what we do <span class="spCh spChx2013">&#8211;</span> but even if you can produce PDF with links is that enough? For web use I&#8217;d like to see more usable fluid materials like the stuff we did in ICE-TheOREM with live, interactive chemical models embedded in web pages, or interactive maps for documents with geographical references, easy in HTML and not possible in PDF.)</p>
</li>
</ul>
<p>We had a follow-up meeting about LaTeX (and online document editors, more of which later) and reached the conclusion that ICE might be able to help manage LaTeX files and bundle them as courses, if communities of users could agree on which LaTeX stlyes or macros to use. Some people insist that for maths-heavy courseware the only practical delivery medium is PDF.</p>
<h2><a id="id5" name="id5"><!--id5--></a><b>Lessons?</b></h2>
<p>We&#8217;ve learned that where a community is using a tool that meets their requirements it is best to leave them to it. Apparently, though there is pressure on some of the LaTeX users to use ICE as per USQ policy. Me, I think the policy should be based on performance outcomes not mandating how people should achieve the outcomes. I&#8217;d love to see a competitor/successor to ICE emerge from someone who&#8217;s a bit feral and won&#8217;t use the corporate tools (but it aint going to be LaTeX-based).</p>
<h2><a id="id6" name="id6"><!--id6--></a>GOOD</h2>
<p>The GOOD system (Generic Online Offline Delivery) was USQ&#8217;s was a very big, complicated bespoke XML system which never realised its claimed potential. The idea was to build a semantically aware highly structured courseware production system.</p>
<p>There were several issues with GOOD:</p>
<ul class="lib">
<li>
<p>Converting content from FrameMaker was a very <b>costly, slow process.</b></p>
</li>
<li>
<p>Despite the best efforts of the team to design the one true Document Type Definition (a DTD was an ancient kind of document schema they had back in the twentieth century) for USQ courseware, there was <b>exception after exception</b> as more disciplines came on board and wanted different conventions, referencing systems, extensions to the table model and so on and so on, with every discussion ending up in a big meeting to see whether a change was needed. Being a university we didn&#8217;t count the cost of those meetings, but I bet it added up, let alone the cost of all the changes.</p>
</li>
<li>
<p>It was slow to render documents, so changes like sorting out pagination or fixing typos were very painful.</p>
</li>
<li>
<p>But the biggest issue was that <b>almost none of the lecturers used it, see below</b>.</p>
</li>
</ul>
<h3><a id="id8" name="id8"><!--id8--></a><b>Lessons?</b></h3>
<p>There were lots of things that USQ should remember from the GOOD experience, most of which were well understood in the XML/SGML community before the project started, but there&#8217;s nothing like experiencing these things for yourself:</p>
<ul class="lib">
<li>
<p><b>Top-down mandating of system is risky</b>, particularly if the people you&#8217;re telling what to do are academics in an Australian university. They&#8217;re not process workers who will take orders.</p>
</li>
<li>
<p><b>Don&#8217;t build a big system without sorting out basic issues</b> like what editor you are going to use and testing to see whether the user community will be able to use it and actually do so.</p>
</li>
<li>
<p>The system produced HTML and PDF from the same source document alright but then so did the demo system we built at NextEd using FrameMaker at about 100<sup>th</sup> of the cost. I think maybe GOOD would have had more traction i<b>f all that semantic markup had been used to better effect</b>, so people could see the point of the extra work and cost involved. When I joined USQ I worked with the GOOD team (particularity Oliver Lucido who now works with us in ADFI) to demo some of the potential but our proposals never made it into the learning management system which the students used.</p>
</li>
</ul>
<h2><a id="id10" name="id10"><!--id10--></a>CPS</h2>
<p>The NextEd Continuous Publishing System was my baby at NextEd, sponsored by another current USQer Cameron Loudon who ran the conversion team. It was a word processor based HTML publishing system which many of our clients used and which was used for the company intranet. But even when customers didn&#8217;t fully embrace it for courseware, we were able to use it internally to dramatically cut costs.</p>
<ul class="lib">
<li>
<p>Server based like ICE server.</p>
</li>
<li>
<p>Used an earlier version of ICE templates (inherited from Standards Australia where I helped set them up for writing standards).</p>
</li>
<li>
<p>No print output.</p>
</li>
<li>
<p>Banned at USQ&#8217;s Distance and e-Learning Centre. (True, even though there was a small user community who liked it, and it was open source it was considered a risk to use. I think one of the issues was whether the open source code was truly clean-room open, and then there was the GOOD system with which the CPS competed (except for the lack of print output)).</p>
</li>
</ul>
<h3><a id="id12" name="id12"><!--id12--></a><b>Lessons</b>?</h3>
<p class="P2">The CPS worked pretty well. Ron Ward did most of the work on the later versions, and together we learned a lot about modern software development and how to keep things as simple as possible. For example lots of the features we thought we needed it turned out we didn&#8217;t and even with the simplest possible model we could think of for document-reuse across semesters it was too hard for most users to get their heads around.</p>
<p class="P2">It was eventually made open source, but too late, meaning that all that work died with most of NextEd&#8217;s business. I resolved that if I worked on software for other people again I would either (a) get paid substantial amounts of money or (b) get to release the software under an open source license so I could build on it in future, never mind the other potential benefits of open source software.</p>
<p class="P2">The big thing that I miss about CPS which is so far not present in ICE is that it was very much driven by metadata, which means that courses self-assembled as you uploaded and described parts of a course. This is an area we are exploring with our ICE/<a href="fascinator.usq.edu.au/">Fascinator</a> mashup which we hope will be used to serve all the universities policies and procedures in a faceted, browse/search interface before too long.</p>
<h2><a id="id13" name="id13"><!--id13--></a>ICE</h2>
<p>Regulars here probably know more than they want to about ICE and you can read about it on the website, and in various papers (Sefton et al. 2009; Sefton 2006; Sefton 2007).</p>
<ul class="lib">
<li>
<p>Originally called GOOD-lite (2004) for (internal) marketing reasons and changed immediately to The Integrated Courseware Environment for (external) marketing reasons. (Then we changed Courseware to Content to go after a grant.)</p>
</li>
<li>
<p>Approved within DeC without the benefit of the kind of USQ governance we now have.</p>
</li>
<li>
<p>Grew organically from a user-base of one.</p>
</li>
<li>
<p>Core system in 2009.</p>
</li>
</ul>
<h3><a id="id15" name="id15"><!--id15--></a>Lessons?</h3>
<p>Lots of lessons, but we&#8217;re still working out what they are. The big ones for me are:</p>
<ul class="lib">
<li>
<p>Trying to build a replicated version controlled content management system as well as the core ICE function which was to make HTML and PDF courseware from word-processing files was a big mistake, cost us a lot for limited benefit. My fault, I think for getting carried away with <a href="www.joelonsoftware.com/items/2008/05/01.html">architectural space-travel</a>. We&#8217;re going to see if we can get away without using Subversion or anything remotely like it for future versions and focus on the things that ICE does that no other system does, mainly having good generic word processing templates and turning them into HTML. Yes it is strange that no other open system does this but no, we don&#8217;t know of anything comparable.</p>
</li>
<li>
<p>The agile, organic approach worked well to make the actual software but because we started the project under local governance, just before a big project to centralise uni IT services, by the time it was ready to roll out there was a whole new governance framework in place and it took longer than it should to navigate that. Future projects need to move into core mode much more smoothly.</p>
</li>
</ul>
<h1><a id="id17" name="id17"><!--id17--></a>Future directions?</h1>
<p>I put up a slide for discussion with some bullet points, some brief notes here on each point:</p>
<ul class="lib">
<li>
<p><b>Concept Maps? (<a href="ice.usq.edu.au/blog/2009/11/25/ice-concept-maps.html">Bron&#8217;s summary</a>)</b></p>
<p>Mark Phythian and co (Phythian &amp; Das Gupta 2008) have been trialling Concept Maps as a learning and teaching aid. Mark started by wondering if Concept Maps could provide a navigation aid for courses, then started looking at how they might help learners. We had a meeting on Tuesday which affirmed that ADFI will keep helping with this line of research, with a view to building open tools for the use of concept maps in learning, teaching and research as indicated by ongoing evaluations like Mark has been doing.</p>
<p>We also heard about Mind Maps (don&#8217;t get me started on that one) and something called Argument Maps which were new to me . Duncan is exploring how <a href="www.aus-e-lit.net/%5C">work on Aus-e-Lit</a> might be used in our tools to build concept-map-like navigation or aggregation in our tools.</p>
</li>
<li>
<p><b>Efficiency:</b></p>
<p>There were a few points that came down to ICE efficiency and performance; there are already processes under way to make ICE more responsive by getting some of the large video content out of it and into more suitable repositories, building towards university-wide media and courseware repository and discovery services. ADFI may have a role to play in developing some of this infrastructure, and we are gearing up to build and pilot some software along these lines in 2010.</p>
<ul class="lib">
<li>
<p><b>Media repository?</b></p>
<p>Yep, we know we need it and there are people looking at this. It is clear that we need some kind of repository of course content to remember what we served up to students; Bron Chandler is looking at the new version (2) of the Moodle learning management system to see how repository-like it is.</p>
</li>
<li>
<p><b>Drop versioning from ICE?</b></p>
<p>It seems that the versioning is not one of ICE&#8217;s most used features and people would be happy to sacrifice it for extra speed. Some of the maths and computing people would lament the loss of subversion, but I figure they know how to type: <code>svn add *; svn commit -m 'Finished for semester one!'</code></p>
</li>
<li>
<p><b>Syncing?</b></p>
<p>Ditto.</p>
</li>
</ul>
</li>
<li>
<p><b>GoogleDocs and other online editors?</b></p>
<p>We had a group looking at online editors today. Stijn Dekeyser is particularly interested in either working with Google Docs and its APIs to do some ICE integration, or better yet designing a collaborative structured semantically aware editor. The latter sounds fun, but it would be a huge project and I think would be well beyond us without a very strong partner. We will look at opportunities for work in this area where we can. Via Anna Gerber I got a tip to look at Google Docs Base editor which uses the Google Docs APIs:</p>
<blockquote class="bq">
<p><a href="twitter.com/AnnaGerber"><span class="Strong_20_Emphasis">AnnaGerber</span></a> <a name="status_star_6002880028"><!--status_star_6002880028--></a></p>
<p>@<a href="twitter.com/ptsefton">ptsefton</a> Have you seen Google Docs Base Editor? <a href="code.google.com/p/gdbe/" onclick="javascript:window.open(&quot;http://code.google.com/p/gdbe/&quot;);return false;">code.google.com/p/gdbe/</a></p>
</blockquote>
</li>
<li>
<p><b>Google Wave (no)</b></p>
<p>See my <a href="ptsefton.com/2009/11/17/wave-as-a-scholarly-document-editor-not-promising-at-this-stage.htm">recent critique</a>.</p>
</li>
<li>
<p><b>More LaTeX support?</b></p>
<p>I wrote about this above. If the LaTeX users can organise themselves, there might be a case for extending ICE&#8217;s limited support for LaTeX to help with the course-management side of things. Look, if someone is telling a mathematician to dump LaTeX and use Word or openoffice.org just because university policy is to &#8216;use ICE&#8217; then I think that&#8217;s wrong and I&#8217;m happy to help them fight their case. But I&#8217;m also going to fight against everyone who wants to just put PDF on the web and not take advantage of the web in every possible way, so I support PDF-only courses only where they are so maths-heavy there is no other practical way to deliver them right now.</p>
</li>
<li>
<p><b>Ebook delivery</b></p>
<p>I have been using and loving an ebook reader on my android phone, which experience I will go into in more detail soon. There was interest in adding eBook publishing to ICE from both the library, where ex RUBRIC colleague Alison Hunter emphasised the importance of being able to deliver electronic books as part of the library of the future and from the Learning and Teaching Support Unit, where Michael Sankey thought eBook delivery would be important for open educational resources. (Which reminds me, need to make some more <a href="ptsefton.com/2009/05/05/three-big-hairy-audacious-goals-for-an-open-usq.htm">noise about Open Courseware and other open things we could be doing</a>.)</p>
<p>Linda Octalina and Cynthia Wong got ICE packaging books in EPUB format in less than a day and we&#8217;re ironing out bugs and testing in Stanza on the iPhone and Aldiko on Android. When that&#8217;s done we will had the plugin over to Michael at LTSU and see if we can get some trials going.</p>
</li>
<li>
<p><b>SiteCore/SharePoint?</b></p>
<p>These are the corporate web CMS and intranet systems. It would be nice if they could understand ICE documents; we&#8217;re not going to tackle these this week, but I hope that USQ gets around to it one day. Given that we can produce high-quality HTML pages for courses from the tool that most people use for documents authoring (MS Word) it seems a pity not to extend that to more corporate documents. There is the forthcoming policies and procedure site which will show just how much <b>better ICE documents are</b> than standard ad-hoc unstlyed word documents.</p>
</li>
<li>
<p><b>Moodle tie-in?</b></p>
<p>This is a big one, for which we don&#8217;t have a lot of data yet. Bron chandler is investigating Moodle 2.</p>
</li>
<li>
<p><b>Theses?</b></p>
<p>Nothing much to report since <a href="ptsefton.com/2009/08/24/ice-for-theses-thesice-where-we-are-we-up-to.htm">August,</a> but I hope to talk more with people at ANU about theses, as I write this my honours thesis is not function in ePub format, possibly because the title it too long: <i>MAKING PLANS FOR NIGEL: Defining interfaces between computational representations of linguistic structure and output systems: Adding intonation, punctuation and typography systems to the PENMAN system.</i></p>
</li>
<li>
<p><b>Journals (OJS)?</b></p>
<p>There&#8217;s a project to get OJS up at USQ, and we have had a long running dialogue with the PKP folks about ICE integration, but nothing new to report at this stage.</p>
</li>
<li>
<p><b>Annotations!!</b></p>
<p>There was strong support for taking the kind of in-document in-browser annotation we have in ICE as an authoring service and making it more generally available. Ron Ward is working with Duncan Dickinson to see if we can get the open <a href="metadata.net/sfprojects/dannotate.html">Dannotate</a> system going in our systems with a view to having rich discussions in-context as part of the teaching and learning process. We are keen to get annotations going for eResearch too, for peer review, and for public participation in research in our work with the Public Memory Research Centre.</p>
</li>
<li>
<p>Various contributions about potential functionality from Michael Sankey (thanks!): syndication of audio and video content from YouTube, Facebook, inline quizzes, infopath/Sharepoint forms integration. Implementation of a global glossary and integration with and Mahara.</p>
<p>We&#8217;re going to find out more about Mahara and how we might bridge between it and ICE and media repositories and Eprints and our eReserve system and library catalogue and so on.</p>
</li>
</ul>
<h1><a id="id19" name="id19"><!--id19--></a>Working parties</h1>
<p>And we have 4 mini &#8217;sprint&#8217; developments/investigation happening in the tech team this week. These are designed to explore some of the many things that were brought up by our colleagues where we think we can get something to show quickly that will advance ICE significantly:</p>
<ol class="lin" style="list-style: decimal;">
<li>
<p><b>Annotations:</b> Ron Ward &amp; Duncan Dickinson. The point here is to look for generic annotation services for academia covering comment, discussion, notes for personal use, peer review, marking and so on, using a common web based system across multiple web applications. The basis for this is <a href="metadata.net/sfprojects/dannotate.html">Dannotate,</a> to which we are adding some user-interface tweaks</p>
</li>
<li>
<p><b>ePub-format books</b>: Linda Octalina and Cynthia Wong have this mostly working <span class="spCh spChx2013">&#8211;</span> we&#8217;ll post some examples soon.</p>
</li>
<li>
<p><b>Packaging of arbitrary collections of resources</b> (ICE, flickr, images, data) using The Fascinator: Oliver Lucido is looking at a proof of concept here; to show a possible ICE-future.</p>
</li>
<li>
<p><b>Moodle 2 integration possibilities</b>: Bron Chandler. This is fact-finding at this stage, but if we can hook in any of the above into a demo then we will.</p>
</li>
</ol>
<h1><a id="id21" name="id21"><!--id21--></a>References</h1>
<p class="P5">Phythian, M.W. &amp; Das Gupta, J., 2008. Hyperlinked concept map enhancements for electronic study materials. Available at: http://eprints.usq.edu.au/4776/ [Accessed November 24, 2009].</p>
<p class="P5">Sefton, P., 2007. An integrated approach to preparing, publishing,</p>
<p>presenting and preserving theses. In <i>ETD 2007</i><span style="font-style:normal;"><span class="T5">. Uppsala. Available at: http://eprints.usq.edu.au/archive/00002653/ [Accessed July 2, 2007].</span></span></p>
<p class="P5"><span style="font-style:normal;"><span class="T5">Sefton, P., 2006. The Integrated Content Environment for Research and Scholarship.</span></span> <i>ICE Website</i><span style="font-style:normal;"><span class="T5">. Available at: http://ice.usq.edu.au/introduction/ice_rs.htm [Accessed April 30, 2007].</span></span></p>
<p class="P5"><span style="font-style:normal;"><span class="T5">Sefton, P., Downing, J. &amp; Day, N., 2009. ICE-theorem &#8211; end to end semantically aware eResearch</span></span> <span style="font-style:normal;"><span class="T5">infrastructure for theses.</span></span> <i>University of Southern Queensland</i><span style="font-style:normal;"><span class="T5">. Available at: http://eprints.usq.edu.au/5248/1/ice-theorem-paper-OR09.htm [Accessed August 24, 2009].</span></span></p>
<p class="center">Copyright Peter Sefton, 2009. Licensed under Creative Commons Attribution-Share Alike 2.5 Australia. &lt;<a href="creativecommons.org/licenses/by-sa/2.5/au/">creativecommons.org/licenses/by-sa/2.5/au/</a>&gt;</p>
<p class="center"><a href="creativecommons.org/licenses/by-sa/2.5/au/" name="HTTP:::DBPEDIA.ORG:SNORQL:?QUERY=SELECT+%3FRESOURCE%0D%0AWHERE+{+%0D%0A%3FRESOURCE+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%2FBIRTHPLACE%3E+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FRESOURCE%2FSYDNEY%3E+%3B%0D%0A%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%"><!-- --><img alt="HTTP://DBPEDIA.ORG/SNORQL/?QUERY=SELECT+%3FRESOURCE%0D%0AWHERE+{+%0D%0A%3FRESOURCE+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%2FBIRTHPLACE%3E+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FRESOURCE%2FSYDNEY%3E+%3B%0D%0A%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%" class="fr1" height="31" src="http://ptsefton.com/wp-content/uploads/2009/11/ICEWeek1_filesm40ca94ba.png.png" style="border:0px; vertical-align: top" width="88"><br />
</a></p>
<p class="center">This post was written in OpenOffice.org, using templates and tools provided by the <a href="ice.usq.edu.au/">Integrated Content Environment</a> project and published to WordPress using <a href="fascinator.usq.edu.au/desktop/desktop.htm">The Fascinator</a>.</p>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://ptsefton.com/2009/11/25/ice-week-background.htm/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Promising developments in the Australian Access Federation</title>
		<link>http://ptsefton.com/2009/11/18/promising-developments-in-the-australian-access-federation.htm</link>
		<comments>http://ptsefton.com/2009/11/18/promising-developments-in-the-australian-access-federation.htm#comments</comments>
		<pubDate>Wed, 18 Nov 2009 05:11:12 +0000</pubDate>
		<dc:creator>ptsefton</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://ptsefton.com/2009/11/18/promising-developments-in-the-australian-access-federation.htm</guid>
		<description><![CDATA[
PDF version


At the eResearch Australasia conference last week I attended a Birds of a Feather session, Authorisation community developments run by Markus Buchhorn (Intersect), Lyle Winton (Victorian eResearch Strategic Initiative), Clare Sloggett (Intersect) and Neil Witheridge (ARCS). Clare asked me to come along, possibly because she knew I had been critical of some aspects of [...]]]></description>
			<content:encoded><![CDATA[<abbr class="unapi-id" title="http://ptsefton.com/2009/11/18/promising-developments-in-the-australian-access-federation.htm"><!-- &nbsp; --></abbr>
<div class="rendition-links"><span class="pdf-rendition-link"><a href="http://ptsefton.com/wp-content/uploads/2009/11/encouraging-developments-at-AAF.pdf.pdf" title="View the printable version of this page">PDF version</a></span></div>
<div class="body">
<div>
<p>At the eResearch Australasia conference last week I attended a Birds of a Feather session, <a href="http://www.eresearch.edu.au/2009bof08">Authorisation community developments</a> run by Markus Buchhorn (Intersect), Lyle Winton (Victorian eResearch Strategic Initiative), Clare Sloggett (Intersect) and Neil Witheridge (ARCS). Clare asked me to come along, possibly because she knew I had been critical of some aspects of the Australian Access Federation machinery and how it is supposed to work. I am pleased to report, though that it sounds like some things have changed for the better.</p>
<p>About a year ago, I addressed the ARROW repository community about the then-new Object Reuse and Exchange (ORE) standard. I <a href="http://ptsefton.com/2008/10/14/what-the-oai-ore-protocol-can-do-for-you.htm#id4">started with a slightly belligerent rant about another standard, XACML</a> (eXtensible Access Control Markup Language). XACML was supposed to be a part of how the Australian Access Federation (<a href="http://www.aaf.edu.au/">AAF</a>) was supposed to allow all of us in the Australian Higher-Education and research community to access each other&#8217;s stuff based on our <i>roles</i>.</p>
<p>Back then <a href="http://ptsefton.com/2008/10/14/what-the-oai-ore-protocol-can-do-for-you.htm#id4">I noted</a> the difficulty with defining role attributes in a way that would work cross-institution:</p>
<blockquote class="bq">
<p>I was always vaguely worried about how XACML policies were going to work but one day I met Kent Fitch who really nailed it. On the subject of these use cases for XACML where you, an anthropologist want to grant access to a repository to other anthropologists, he asked <span class="spCh spChx201c">&#8220;</span>What<span class="spCh spChx2019">&#8217;</span>s an anthropologist?<span class="spCh spChx201d">&#8221;</span></p>
<p>This is a very, very good question. Does an academic working in the education faculty who self-identifies as a visual ethnographer qualify? What if she<span class="spCh spChx2019">&#8217;</span>s got an honours degree in anthropology? [<span class="spCh spChx2026">&#8230;</span>] [fixed a spelling error]</p>
</blockquote>
<p>(I finally have an answer for Kent about what an anthropologist is. Read on.)</p>
<p>My previous understanding of how the AAF would work was that my host organisation would have to be solely responsible for asserting stuff about me, who I was and what roles I had, because there was no publicly shareable identifier to allow other people to assert things about me, such as that they trusted me to look at some data or edit a document. This meant that simple, obvious use-cases like a group of ethnographers setting up a collaboration space by asking each other for their login names would not be possible; due to privacy concerns there would be a unique ID for each AAF person but <b>there would be no sharing</b>.</p>
<p>I&#8217;m pleased to report that things have changed and the AAF now <b>does encourage the use of public shareable IDs</b>. So a few points occur to me. The first couple I raised at the BoF, the last one came later.</p>
<ol class="lin" style="list-style: decimal;">
<li>
<p>This means we can <b>let people self-organise</b> by adding white-lists of known IDs to their systems. An institution might not be a reliable way to sort the anthropologists from the ethnographers, but they can sort themselves out all right and form whatever working groups they want.</p>
</li>
<li>
<p>It was noted at the BoF that some people would have more than one AAF ID, I suggested that it might be good to <b>register these in the new researcher ID system</b> that I believe is being set up at the National Library as part of the <a href="http://ands.org.au/">ANDS</a> infrastructure. I think this new NLA system will be what Nick Nicholas calls an <a href="http://blog.linkaffiliates.net.au/2009/10/23/approaches-to-fluid-identity-identifier-assertion-hubs/">Identifier Assertion Hub</a>.</p>
</li>
<li>
<p>There was some talk of <b>systems that could manage groups of AAF users</b>. This answers the bigest problem I had with the AAF not being able to work with ad-hoc or user-defined communities, only with role attributes assigned by the home identity provider.</p>
<p>Following from this I finally worked out how to answer Kent&#8217;s question <span class="spCh spChx201c">&#8220;</span>what&#8217;s an anthropologist<span class="spCh spChx201d">&#8221;</span>. An (Australian) anthropologist is someone who&#8217;s a member of the he <a href="http://www.aas.asn.au/">Australian Anthropological Society</a>.</p>
<p>In an identity federation that does include public, shareable identifiers, societies could publish lists of members or run a group-server that could be used by other services.</p>
</li>
</ol>
<p>Finally, I can&#8217;t help commenting that if we&#8217;re going to have a federation where a lot of the trust relationships are devolved to the users, allowing them to assemble groups, or to societies like the AAS then maybe we should consider allowing the use of <a href="http://openid.net/">OpenId</a> as well as or instead of <a href="http://shibboleth.internet2.edu/">Shibboleth</a>. I bet not everyone in the AAS has an institutional login at an AAF member, even once the AAF is fully operational with wall to wall Shibboleth, but anyone can get an OpenId.</p>
<p class="center">Copyright Peter Sefton, 2009. Licensed under Creative Commons Attribution-Share Alike 2.5 Australia. &lt;<a href="http://creativecommons.org/licenses/by-sa/2.5/au/">http://creativecommons.org/licenses/by-sa/2.5/au/</a>&gt;</p>
<p class="center"><a href="http://creativecommons.org/licenses/by-sa/2.5/au/" name="HTTP:::DBPEDIA.ORG:SNORQL:?QUERY=SELECT+%3FRESOURCE%0D%0AWHERE+{+%0D%0A%3FRESOURCE+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%2FBIRTHPLACE%3E+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FRESOURCE%2FSYDNEY%3E+%3B%0D%0A%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%"><!-- --><img alt="HTTP://DBPEDIA.ORG/SNORQL/?QUERY=SELECT+%3FRESOURCE%0D%0AWHERE+{+%0D%0A%3FRESOURCE+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%2FBIRTHPLACE%3E+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FRESOURCE%2FSYDNEY%3E+%3B%0D%0A%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%" class="fr1" height="31" src="http://ptsefton.com/wp-content/uploads/2009/11/encouraging-developments-at-AAF_filesm40ca94ba.png.png" style="border:0px; vertical-align: top" width="88"><br />
</a></p>
<p class="center">This post was written in OpenOffice.org, using templates and tools provided by the <a href="http://ice.usq.edu.au/">Integrated Content Environment</a> project and published to WordPress using <a href="http://fascinator.usq.edu.au/desktop/desktop.htm">The Fascinator</a>.</p>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://ptsefton.com/2009/11/18/promising-developments-in-the-australian-access-federation.htm/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
