[ptsefton.com] | [CV & Bio]

A killer app for handles

2007-03-16

I've touched on persistent identifiers in general and handles in particular here a couple of times. One time I wrote about plain text citations. Another time I looked at migration issues. I've been what you might call a bit of a handle sceptic, 'cos I didn't really see how they fitted with real-life web based repositories, but for the first time this week I think I can see new services built to exploit handles could really make life easier. I'll go through a bit of background on the issues first, then look at what I think may be a killer app for handles. Killer enough to consider using them sooner rather than later, anyway.

About handles

I speculated a while ago about whether a full-text citation, makes a good persistent identifier for a conference paper. The answer was sort of no, as far as automated lookup goes, although the clues offered by a text citation mean you can do some detective work to find something.

Another approach to naming things is to use a handle. A handle is a name with some special properties, and is backed some computer infrastructure designed to help you find the digital object that the handle names.

Lets look at a real example.

Here's a handle: 1959.1/2635

This is a name that's guaranteed to be unique within the handle infrastructure. It might also be a part number for a low cost cutting-edge high-performance fibre-composite prosthetic chicken foot, made by a small company in Gatton, but that's irrelevant.

The handle website explains:

Within the handle namespace, every identifier consists of two parts: its handle prefix, also known as a "naming authority", and a suffix or unique "local name" under the prefix. The prefix and suffix are separated by the ASCII character "/". An identifier may thus be defined as

<Handle> ::= <Handle Prefix> "/"<Handle Suffix>

For example, "10.1045/april2006-paskin" is an identifier (also known as a Digital Object Identifier (DOI), an implementation of the Handle System) for an article published in D-Lib Magazine. It is defined under the prefix (naming authority) "10.1045", and its suffix (local name) is "april2006-paskin".

http://handle.net/overviews/handle-syntax.html

What can you do with the handle known as 1959.1/2635?

You can go to a handle resolver and ask it to take you to the object.

  1. Go to http://hdl.handle.net/

  2. Put the handle, 1959.1/2635 in the handle box.

  3. Leave the checkboxes alone, I won't discuss those.

  4. Hit Submit.

Within a second or two you'll be looking at something like this: http://arrowprod.lib.monash.edu.au:8000/access/detail.php?pid=monash:2635&datastream=

Of course when you read this post the URL that the handle resolver takes you to might be different. But the idea is that if you use the resolver then you'll always get the metadata summary page for a paper in the Monash University institutional repository even if they change the software.

Now, that page says:

Please use this identifier to cite or link to this item:http://arrow.monash.edu.au/hdl/1959.1/2635

So if I were to mail you a link, that's the one I'm supposed to use. I could just mail you the handle and assume that you'd know what to do with it but I'm not that mean.

So, there are quite a few ways to refer to this thing:

  1. http://arrow.monash.edu.au/hdl/1959.1/2635

  2. hdl:1959.1/2635

  3. http://hdl.handle.net/1959.1/2635

  4. http://arrowprod.lib.monash.edu.au:8000/access/detail.php?pid=monash:2635&datastream=

  5. http://arrowprod.lib.monash.edu.au:8000/access/detail.php?pid=monash:2635&datastream

The Monash folks would like you to use the first option, so lets call that the canonical URL. This would mean that in future they could get rid of the redirect step when you end up looking at a messy looking URL, lets call that the messy URL, instead of the the canonical URL.

The last example is a problem. That's the messy URL minus the last '='. Unfortunately that works. This is unfortunate because it's an easy error to make, and it will work, but that link is highly likely to be non-persistent.

I bet lots of people will just copy the messy URL from the address bar. Some might even think that there's something odd going on when they use the URL they are requested to use but the browser changes it on them.

http://arrowprod.lib.monash.edu.au:8000/access/detail.php?pid=monash:2635&datastream=

Which means a maintenance headache for the Monash repository minders. If they upgrade their repository software in a way that will change the URL for an item, which I happen to know they are planning to do, then they will need to think about redirecting all requests for those old fashioned messy URLs that used to work but no longer do.

This is easy enough to do: they need to tell the web server how to recognize requests directed at old versions of the repository, yank out the local identifier, monash:2635, cut it up, stick it together with the handle prefix 1959.1 to make the complete handle 1959.1/2635 and then redirect the request to a handle resolver. That's one line of redirect code. (And there's a problem if the handles are not related to the local identifiers, that simply won't work you'd need a lookup table)

There also needs to be an ongoing commitment from the institution that as long as there's DNS (the bit that resolves the name arrow.monash.edu.au to a particular Internet address) and HTTP (the protocol your web browser uses to talk to the repository software) that they will keep the redirect in place. Ongoing commitment is a policy and governance thing. It's about putting processes in place to ensure that future systems don't break things that used to work. So the simpler the thing you commit to, the more chance there is that future generations will be able to keep it up.

I used to think that because you had to do this redirect stuff anyway that buying in to handles was just another thing to worry about. Another mouth to feed. Handles are not just for Christmas.

But what if the handles stuff actually helped?

A killer app (or two) for handles?

Lyle Winton from the PILIN project has proposed a really nice sounding service, findurl. Instead of having to work out a bit of complicated redirect-code that parses a request and works out where to redirect it for each and every bit of web software you ever install, why not ask a new handles service to find it for you? Send it the old URL and it will look for the handle record that lists the URL in question, then redirect you to the new home for the object.

This means that for any handle-enabled site you only need to commit to maintaining one teensy bit of redirect code. It would be saying Looks like this request is for something I can no longer serve. Lets send this request on to the findurl service.

(If the software has had multiple ways of viewing the same record with different URLs then a smarter redirect than a simple findurl might be appropriate. But for migrating from a well-behaved system findurl looks to me like a really simple solution)

A second killer app would be handle resolver which can act as a proxy rather than a redirector. That is you'd go to the canonical URL for an item at Monash, and instead of it redirecting you it would fetch the page itself and display it in the browser so you'd request this:

http://arrow.monash.edu.au/hdl/1959.1/2635

And without any redirect shenanigans you'd be looking at the metadata page for the item. (This is already on the wish list for the software vendor to provide in future.)

This handle-proxy service may also be useful at a cross-institution level. Take the Australian Government. Departments change their names, appear, disappear, merge and split all the time. A government-wide handle-proxy for things that should have a long life and a stable name would be really useful.

There's lots more to think about with handles that I have covered here, obviously. But if you take the Monash approach of hosting your own handles services then I think you are probably pretty safe.