It seems to me that there is a bit of buzz at the moment around the need for a desktop eResearch tool that can organize your stuff locally and push it up to a managed store.
In no particular order.
There was a conversation between a couple of Sydneysiders on Twitter on the #eresearch channel:
usyd_dpa: @jimrhiz The research groups I work with have schemas and taxonomies. either domain- or project specific (e.g. HISPID, VRA). #eresearch
1 day ago from TweetDeck · Reply · View Tweet ·
jimrhiz: @usyd_dpa For taxonomy tool, is NSDL Registry http://metadataregistry.org/ of any relevance? Need to consider sustainability! #eresearch
1 day ago from twitterrific · Reply · View Tweet
jimrhiz: @usyd_dpa Could also look at @iand ’s Open Vocab http://is.gd/likA (expand)
2 minutes later from web · Reply · View Tweet
usyd_dpa: @jimrhiz The research groups I work with have schemas and taxonomies. either domain- or project specific (e.g. HISPID, VRA). #eresearch
5 minutes later from TweetDeck · Reply · View Tweet
usyd_dpa: @jimrhiz #eresearch Different disciplines, common need for tools to enable discipline -specific metada & taxonomy creation / maintenence
2 minutes later from TweetDeck · Reply · View Tweet
usyd_dpa: @jimrhiz #eresearch … maybe like joomla or plone with a taxon plugin? – so long as its robust, usable, adaptable, export struct. packages
5 minutes later from TweetDeck · Reply · View Tweet
usyd_dpa: @jimrhiz #eresearch This sort of tool wouldn’t even need
a flash front end. Index, search, presentation could be handled elsewhere.11 minutes later from TweetDeck · Reply · View Tweet
jimrhiz: @usyd_dpa To spare #eresearch , maybe this discussion needs its own tag, say #taxontools
5 minutes later from twitterrific · Reply · View Tweet
jimrhiz: @usyd_dpa Are these links from @maheshcr any use: http://is.gd/lir2 (expand) ? What about Protégé from Stanford? #taxontools
3 minutes later from web · Reply · View Tweet
Subsequently, Rowan Brownlee (usyd_dpa) has started a conversation on the ANDS group asking about tools for researchers to organize their stuff, label it using taxonomies and share it. So far no response to that. I would have expected someone to mention Field Helper, which is from Sydney.
Field Helper is a desktop application that enables you to quickly view and categorise groups of related digital files and then submit the resulting package to a repository for long term preservation and access. Digital repositories require a submission to be formated in a specific way and be described according to a standard meta data encoding schema. Working with Field Helper results in a ZIP file containing compressed versions of your files along with a METS (Metadata Encoding and Transmission Standard) file which contains a detailed description of each file and its relationship to other files in the submission. METS is a standard that works with most repositories and – where required – can be easily translated into a form that non METS compliant repositories can work with.
I have looked at Field Helper in the past – I don’t think that the metadata tagging is going to scale very well and the system for mapping tags onto formal metadata seems a bit clumsy but it does some of what I think Rowan is asking for.
Also in the last few days, Les Carr mourns another lost disk drive and other lost week of work, and tells us why he needs trusted storage. He says:
So an intelligent store should help me understand what I have – a bit like the way that user tools like iPhoto help you understand and organise thousands of images. It should be possible to get a highly distilled overview/representation/summary/visualisation of all my intellectual content/property/achievements as well as a detailed and comprehensive store of all my individual documents and files.
I guess you can see where I’m going with this. I’ve gone and got the ideal desktop storage and the dream repository all mixed up. Well perhaps I have – but why not?
Yes – I can see where Les is going – at least I think he’s hinting at ePrints on the desktop.
I made the same sort of mixup in my mind back in December:
Thinking about this led me to the idea of putting something like The Fascinator on the desktop, letting it find all your stuff, giving you a simple way to organize it into projects, embargo bits of it and so on, and then automate the process of disseminating it to the institutional and other places you’d like it go. I’m thinking of something like Picasa (which finds all your pictures on your hard drive no matter how embarrassing or not safe for work they are) and iTunes which although in my opinion potentially evil has some nice ways of browsing and organizing content, but with a connection to the world wide repository grid. More on this idea soon.
And finally, we have Dorothea Salo with this one liner. There are a lot of other lines you can read as well in her response to this piece at Library Journal dot com.
Data curation and IR population need to be reframed as collection-development challenges.
Now, maybe Rowan’s plea to the ANDS group will turn up an application that does what we want, but I think we might have to take up the collection-development-challenge.
Dorothea says:
Bluntly, DSpace and EPrints are completely inadequate to meet the data-curation challenges you [Clifford Lynch] outline; and Fedora can mostly do the job, but only with major hacking. This is unacceptable. How can we offer data services when we don’t have basic building-blocks to work wi
th?
Here at the Australian Digital Futures Institute (ADFI) we’re up for a bit of ‘major hacking’, although we find it soothes our management team to call it ’software development’. That’s why we pretend that I’m the manager of the Software Development Research and Development team, not just the one with the biggest mouth in a feral mob of hackers.
At the moment we are working with Chris Lee’s Public Memory Research Centre on a repository for the humanities. Ultimately it will have creative arts content and research materials – we’re starting with a military history project run by Leonie Jones.
Leonie has been doing exactly what Rowan describes: “They each manage binary and text content on their desktops or departmental fileservers using spreadsheets and/or sql databases”. Leonie has spreadsheets.
We are considering trying the following based on our Fedora-based lightweight repository solution known as The Fascinator.
We’ll start with a local installation of The Fascinator – that puts Fedora 3 and Apache Solr on your desktop. Don’t worry, we have a simple installer. It’s all Java, so it might be painful for the programmers at times but it should install pretty much anywhere.
Then we will add a file-system indexer for The Fascinator – pretty much like what Picasa does, it will index all of your stuff. It will grab whatever metadata it can, including properties from office documents, EXIF metadata and tags from images . We will also treat the file system as a source of metadata so you will be able to explore using metadata facets and file system facets using the same interface. This should be a very straightforward addition to the existing software, it’s just a matter of bolting together some standard software libraries.
Next comes the taxonomy/tagging bit: we need a way to import tag-sets and taxonomies that you might want to apply to your content and then let you tag it. I think it will be important to support both formal metadata and informal tagging. For example, you might want to set up your own tag hierarchy with home/work at the root, and with work broken down into teaching/research and research broken up by project.
I think the tag hierarchy in digiKam is a good start. Here’s a screenshot showing my home-grown tag set applied to my own photos. I think a new tool should allow both ad hoc DIY sets and more formal ontologies.
So we can add a new action for the user:
There are a couple of things we’d like to look at here:
-
There is often metadata inherent in the file system. I can point to my Music folder and say ‘that’s all owned by me’.
-
There are relationships between files; Leonie has video transcripts with time-codes. If we plug in a smart indexer then we should be able to get our text index to let you find words and jump to the right part of the video. So, we need plugins – which may often have to be one-offs. Ben O’Steen has a great blog post where he talks us through one such curation exercise.
-
Which bits of metadata should be written back into the files? For my own images I have been adamant that I want metadata written back into the files, but what if different members of the family wanted to classify things in different ways? Stand-offish metadata may be better if we can build trusted systems that know how to keep things linked up.
-
Add ‘playlists’ to group content. This is a actually just like what you do in content packaging, for example the organizer in an IMS content package.
At this point in the story we can index everything, label it, and explore it. Next step would be replicating it up into a cloud of repository services. I can imagine a couple of use cases here:
-
Everything with the tag
workis to be backed up to the university system. This would be a mirror of what’s on my desk and by default only accessible to me. There’s going to be a lot of data and we can’t leave it all in our houses offices and labs. -
Everything under
/Musicis to be backed up to a private data store – maybe via another copy of The Fascinator running at home, or even a copy in the cloud somewhere. -
Everything tagged with ePrints goes to you-know-where (Hi Les).
-
Everything tagged with
thesisis to be replicated to the departmental thesis repository where my supervisor will be able to see it as well. -
Everything tagged with PMRC goes to the centre’s repository where the repository’s curator can, you know, curate it. This could be as simple as adding a tag
public, that means that it will then be disseminated to the public institutional repository.
To get this kind of data federation going I’d look at Atom Archive for the feed mechanism (that’s going to kill off OAI-PMH, right?) with OAI-ORE to package the metadata we’ve added with the source-objects.
One feature of this setup which might not be immediately obvious is having the same kind of repository interface on the desk as you would have in a web based repository. The idea is to encourage people to see their research data as ‘in the repository’ from the moment of creation and to be able to take control over how their stuff is disseminated. This is a like the ICE approach of previewing early and often so that people get used to seeing their documents both in paper format and web format.
There are some free desktop applications that do some of what I’m talking about here. Duncan Dickinson is collecting a list of them. They include the aforementioned Field Helper and things like Mendeley which manages research papers, but not, it seems research data.
And lets be clear here. We have never, ever been under the illusion that if we build it they will come, not with our ePrints system, not with ICE, not with any system. We know that if we decide to build then we will have to build something that either the users need, or didn’t know they needed. In this case I am betting the first selling point is the backup feature but funder requirements to place data in repositories may be a motivator in the distant future.
Should we try building this thing? It would only take a fe
w weeks to prototype.




It’s a strong idea, Peter. Several questions and comments:
1. What is the scope of content? The examples you mention are music, videos, ePrints, theses, which tend to have known metadata. How would you handle defining schemas and then entry of formal metadata for arbitrary research datasets? The suggestions I gave in the Twitter discussion were attempts at pointers to ontology creation and maintenance tools, also present in Fieldhelper as you mention (and maybe Mediaflux), in other words, ways to deal with the initial stage of Rowan’s problem.
2. “We’ll start with a local installation of The Fascinator – that puts Fedora 3 and Apache Solr on your desktop.” There’s a concern that this leaves the data on the C: drive for now, and delays bringing them into a well-backed-up environment — not necessarily a “repository”, but a robust data store (like Monash LaRDS, VeRSI’s federated data store, or the ARCS Data Fabric).
3. Your last diagram has a relation to a full “data lifecyle”: see Andrew Treloar’s diagram in my presentation at #DLCsyd09 http://tinyurl.com/betk3l . This sounds as if it’s on the right track! However, a lot of flexibility for different requirements in different cases is needed (access levels for different collaborator groups; version control; …). Can you allow for these needs and keep options open, so as to avoid the risk that by building a system rapidly prior to an overall data store and national data commons context, you are fixing too many choices prematurely?
Long live the feral hacker mobs! It’s the only way anything gets DONE in this space.
Here, have some mouth-foam on me.
Hi Pete,
A couple of observations if I may.
The ANDS general google group is not intended to be a discussion list rather it’s there for announcements.
More importantly your latest ideas certainly resonate with my thinking.
Your Picasa like approach could indeed work but I think it should have an important component added, namely the ability to create or infer collection level RIF-CS metadata based on files organised and stored in locations/folders by the user. Another important addition would be the addition of a persistent identifier minting service. Lastly, the ability to work across local, networked and remote data stores should also be considered.
I know you use these posts to harvest ideas and stir the pot, consider your aims partly accomplished.
Cheers
Neil
[...] Desktop eResearch Revolution http://ptsefton.com/2009/03/05/desktop-eresearch-revolution.htm [...]
I only just saw this. Clearly something’s not working right in the interweb, probably somewhere in the vicinity of Bloglines, Google Groups or email.
Anyway, this looks really interesting. I’d say go for it with the prototype.
Pity about the Java though. What about Java/Python interfaces, Jython…? Can you maybe code in Python and link to existing Java?
A few random thoughts:
The task here gets much more ambitious as you move it in the direction of formal curation, publishing, persistent identifiers and so on. (See Neil’s comment.) The whole federation idea is really attractive but I’ve got the feeling that as items move from the individual’s desktop to public repositories and metadata gets disseminated to federated registries, there will need to be human curation activity along the way. I’m not sure if a fully automated system will be feasible… But I can certainly imagine something that assists the individual in publishing their stuff to shared spaces, by pre-filling most of the data in the ingest forms and flagging what’s missing. But even if that bit of it is some time in the future, the repository-on-the-desktop idea seems valuable to me anyway.
For making it useful and attractive to users, I’d say the first bit of networking would be to build in automated backup to a secure location. (Synchronisation?)
You didn’t mention email. I think it would be really cool to have all my email sucked up and indexed and sorted and tagged and stored and backed up seamlessly along with all my other stuff. (Rather than it being separate.)
It’d be good also to have a simple way to go from looking at something in the repository to opening it its owner application. You’d want to be able to see it through the repository, not just metadata but content (preferably HTML-converted), but also to click through to opening it for action. Of course this raises issues around version control, not just revisions of a document over time, but also the relationship between different formats: the Word document, the PDF, the blog post HTML. Unless people are using ICE for all their writing, they’re likely to have a whole mess of different versions of everything lying around. Can a desktop repository help sort out that mess?
Sorry, that was all a bit stream-of-consciousness, hope it makes sense…
[...] — ptsefton @ 11:41 am Less than two weeks ago I posted here on what I called the Desktop eRearch revolution. Jim Richardson gave that resulting discussion the twitter tag #DTeRrev. You can see what people [...]