Category Archives: Uncategorized

Ozmeka: extending the Omeka repository to make linked-data research data collections for (any and) all research disciplines

Creative Commons License
Ozmeka: extending the Omeka repository to make linked-data research data collections for (any and) all research disciplines by Peter Sefton, Sharyn Wise, Katrina Trewin is licensed under a Creative Commons Attribution 4.0 International License.

[Update 2015-06-11, fixing typos]

Ozmeka: extending the Omeka repository to make linked-data research data collections for (any and) all research disciplines

Peter Sefton, University of Technology, Sydney,

Sharyn Wise, University of Technology of Sydney,

Peter Bugeia, Intersect Australia Ltd, Sydney,

Katrina Trewin, University of Western Sydney,

Katrina Trewin, University of Western Sydney,

There have been some adjustments to the authorship on this presentation, Peter Bugeia was on the abstract but didn’t end up contributing to the presentation, whereas Katrina Trewin withdrew her name from the proposal for a while, but then produced the Farms to Freeways collection and decided to come back in to the fold. The notes here are written in the first person, to be delivered in this instance by Peter but they come from all of the authors.

Abstract as submitted

The Ozmeka project is an Australian open source project to extend the Omeka repository system. Our aim is to support Open Scholarship, Open Science, and Cultural Heritage via repository software than can manage a wide range of Research (and Open) Data, both Open and access-restricted, providing rich repository services for the gathering, curation and publishing of diverse data sets. The Ozmeka project places a great deal of importance in integrating with external systems , to ensure that research data is linked to its context, and high quality identifiers are used for as much metadata as possible. This will include links to the ‘traditional’ staples of the Open Repositories conference series, publications repositories, and to the growing number of institutional and discipline research data repositories.

In this presentation we will take a critical look at how the Omeka system, extended with Ozmeka plugins and themes can be used to manage (a) a large cross disciplinary archive of research data about water-resources (b) an ethno-historiography built around a published book and (c) for managing large research data sets in and scientific institute, and talk about how this work paves the way for eResearch and repository support teams to supply similar services to researchers in a wide variety of fields. This work intended to reduce the cost of and complexity of creating new research data repository systems.

Slightly different scope now

I will be talking about Dharmae, the database of water-resources-themed research data, the project to put the book data into Omeka took a different turn and the scientific data repository is still being developed.

How does this presentation fit in to the conference?

Which Conference Themes are we touching on?

  • Supporting Open Scholarship, Open Science, and Cultural Heritage

  • Managing Research (and Open) Data

  • Building the Perfect Repository

  • Integrating with External Systems

Re-using Repository Content

Things we want to cover:

  • A bit about the research data projects we’ve worked on.

  • How we’ve implemented Linked Data for metadata (stamping out strings!)

  • What about this Omeka thing?

(The picture is one I took of the conference hotel)

What’s Omeka?

We like to call Omeka the “Wordpress of repositories”

It’s a PHP application which is easy to install and get up and running and yes – it is a ‘repository’, it lets you upload digital objects, describe them with Dublin Core Metadata, and no, it’s not perfect.

The Perfect Repository?

So lets talk about this phrase “the perfect repository”. I have been following Jason Scott at the Internet archive (who would make a great keynote speaker for this conference, by the way) and his work on rescuing and making available cultural heritage such as computer-related ephemera and programs for obsolete computing and gaming platforms. He uses the phrase “Perfect is the enemy of done” and talks about how making some tradeoffs and compromises and then just doing it mean that stuff, you know, actually gets done that otherwise wouldn’t.

No, we’re not calling Omeka “third best”, but one of the points of this talk is that instead of waiting for or trying to build the ‘perfect’ research data repository Omeka is a low-barrier-to-entry, cheap way to build some kinds of working-data-repositories or data-publishing websites. I have talked to quite a few people who say they have looked at Omeka and decided that it is too simple, too limited for whatever project they were doing. Indeed, it does have some limitations; the two big ones are that it does not handle access control at all and it has no approval workflow, at least not in this version.

The quote on the slide is via the wikipedia page Perfect is the Enemy of Good

The Portland Common Data Model

Omeka more-or-less implements a subset of the Portland Common Data Model, which I was introduced to yesterday in the Fedora workshop, although as I just mentioned it is not strong on Access control, having only a published/unpublished flag on items.

Why Omeka? We’ll come back to this – but the ability to do Linked Data was one of the main attractions of Omeka. We had to add some code to make the relations appear like this, and easier to apply than I the ‘trunk’ version of Omeka 2.x but that development was not hard or expensive, compared to what It might have cost on top of other repository systems with more complex application stacks. Another

(Note – if you look at the current version of Dharmae, the item relations will appear a little differently, as not all the Ozmeka enhanced code has been rolled out).

Australian national data service (ANDS) funded project

… to kick-start a major open data collection

I’m going to give you a quick backgrounder on our project by way of introduction: ANDS approached us with a funding opportunity to create an open data collection. Many of you will be familiar with the frustrations of funding rules : our constraint was that we were not allowed to develop software, although we could adapt it.

The UTS team put the word out for publishable research data collections but got little response. Then, thanks to the library eScholarship team, Sharyn met Professor Heather Goodall and Jodi Frawley, who had data from a large Oral History project on the impacts of water management on stakeholders in the Murray Darling Basin – called Talking Fish.

And they had had the amazing foresight – the foresight of the historian- to obtain informed consent to publically archive the interview data.

Field science in MDB (from Dharmae)

In the image above MDB means he Murray Darling Basin, a big, long river system with hardly any water in it.

First up I’ll talk about Dharmae. was conceived as a multi-disciplinary data hub themed around water related research, with the “ecocultures” concept intended to flag that we welcome scientific data contributors (ecological or otherwise), as well as cultural data. Because they are equally crucial if we want to research to have an impact on the world.

This position is also supported in the literature of the intergovernmental science policy community and environmental sustainability and resilience research.

One paper expressed it this way – for research to have a transformative impact, its not simply more knowledge that we need, but different types of knowledge.

The literature emphasizes the need for improved connectivity between knowledge systems: those applied to researching the natural world, such as science, and those that investigate socio-cultural practices such as social sciences, history and particularly also indigenous knowledge.

But because these different knowledge systems each come with their own practices and terminologies, we have an interesting information science problem:

How to support data deposit and discovery by users from all disciplines?

Linked data & disambiguation

Essentially by using linked data. We extended the open source repository Omeka by allowing all named entities (like, places, people, species) to be linked to an authoritative source of truth.

Lets take location – it is one of the obvious correspondences between scientific and cultural data..

That still doesn’t mean its an easy thing to link on. Place names are rarely unique as we see Kendell noticing above.

But by using authoritative sources, like Geonames, we can disambiguate place names, and better still we can derive their coordinates.

Now we want users of Dharmae who are interested in finding data by location to access it in the way that makes sense to them – and that may not be name.

Lower Darling/Anabranch

In Dharmae readers can search by place name or they can use a map.

Here is one of 12 study regions from the Talking Fish data, showing the Lower Darling and Anabranch above Murray Sunset National Park.

We georeferenced these regions using a Geonode map server, but we have superimposed the researchers hand-drawn map as a layer on top to preserve the sense of human scale interaction

You can click through from here to read or listen to the oral histories completed in this region, look at photos or investigate the species identified by participants.

You can also search by Indigenous language Community if you prefer.

How else could this be useful?

Lower Darling/Anabranch:

It just so happens that we also have a satellite remote sensing dataset that corresponds reasonably well to this region above the national park.

It shows the Normalized Difference Vegetation Index for the region or the vegetation change over the decade 1996-2006.

Relative increase in vegetation shows as green and relative decrease as pink.

Could the interviews with participants from that region provide any clues as to why?

I can’t tell you that, but the point is that the more we enrich and link data, the more possible hypotheses we can generate.

The Graph

Here’s the graph of our solution: We created local records, so that the Dharmae hub could maintain its own set of ‘world-views’ while still interfacing with the semantic web knowledge graph.

This design pattern is something we want to explore more: having a local record for an entity or concept, with links to external authorities. So, for example we might use a Dbpedia URI for a person, and quote a particular ‘approved’ version of the wikipedia page about them so there is a local, stable proxy for an external URI, but the local record is still part of the global graph. With the species data, this will allow researchers to explore the way the participants in Talking Fish talked about fish and compare this to what the Atlas of Living Australia says about nomenclature and distribution.

From the Journey to Horshoe Bend Website at the University of Western Sydney:

TGH Strehlow’s biographical memoir, Journey to Horseshoe Bend, is a vivid ethno-historiographic account of the Aboriginal, settler and Lutheran communities of Central Australia in the 1920’s. The ‘Journey to Horseshoe Bend’ project elaborates on Strehlow’s book in the form of an extensive digital hub – a database and website – that seeks to ‘visualise’ the key textual thematics of Arrernte* identity and sense of “place”, combined with a re-mapping of European and Aboriginal archival objects related to the book’s social and cultural histories.

Thus far the project has produced a valuable collection of unique historical and contemporary materials developed to encourage knowledge sharing and to initiate knowledge creation. By bringing together a wide variety of media – including photographs, letters, journals, Government files, audio recordings, moving images, newspaper, newsletters, interviews, manuscripts, an electronic version of the text and annotations – the researchers hope to ‘open out’ the histories of Central Australia’s Aboriginal, settler and missionary communities.

JTHB research work entailed creating annotations relating to sections of the book text. The existing book text, marked up with TEI, was converted to HTML and the annotations were anchored within the HTML. Plan was to create an Omeka plugin to display the text and co-display or footnote the annotations relating to each part of the text.


  • The existing annotations were incomplete and the research team wished to continue adding annotations and material. This meant that the HTML would need to be continuously edited (outside Omeka), giving rise to issues around workflow, researcher skills, and version control.
  • Cultural sensitivities were also a barrier to open publication (not an Omeka issue but a MODC one)

Katrina Trewin is a data librarian, working at the University of Western Sydney. While the Journey to Horseshoe Bend project could not be completed using Omeka, due to resource constraints. Another project was able to be completed. Using Omeka, Katrina was able to build web site around an oral-history data set without needing any development. This work took place in parallel with the work on Dharmae at UTS so was not able to make use of some of the innovations introduced in that project such as enhancements to the Item Relations plugin to allow rich-interlinking between resources.

Katrina’s notes:

Material had been in care of researcher for 20+ years.

  • Audio interviews on cassette, photographs, transcripts (some electronic)
  • Digitised all the material
  • Created csv files for upload of item metadata into Omeka
  • Once collections of items were created, then used exhibit plugin to bring material relating to each interviewee together.

Worked well because collection was complete – fine to edit metadata in Omeka but items themselves need to be stable (unlike the JTHB text)

Omeka allows item-level description which is not possible via institutional repository. This could have been done in Omeka interface but was more efficient via csv upload. csv files, bundled item files, readme and Omeka xml output made available from institutional repository record for longer term availability as hosting arrangement is not in place. Chambers, Deborah; Liston, Carol; Wieneke, Christine (2015): Interview material from Western Sydney women’s oral history project: ‘From farms to freeways: Women’s memories of Western Sydney’. University of Western Sydney.

Katrina and team have published all the data as a set of files with a link to the website , in the institutional research data repository. This screenshot shows the data files available for download for re-use. My team at UTS are doing a similar thing with the Dharmae data.

At UTS we are are constructing a growing ‘grid’ of research data services. This diagram is a sketch of how Omeka fits into this bigger picture, showing the geonode mapping service which supplies map display services and can harvest maps from Omeka as well. In this architecture, all items ultimately end up in an archival repository with a catalogue description, as I showed earlier for the Farms to Freeways data.

Interested? Check out Clone our Ozmeka github repostiories


Omeka is a very simple seeming repository solution which is easy to dismiss for projects that demand the ‘perfect’ repository, but looking beyond its limitations it has some strengths that make it attractive for creating ‘micro repository services’ (Field & McSweeney 2014). Our work has made it easier to set up new research-data repositories that adhere to linked-data principles and create rich semantic web interfaces to data collections. This paves the way for a new generation of micro or workgroup-level research data repositories which link-to and re-use a wide range of data sources.


Johnson, Ian. “Heurist Scholar,”2014

Kucsma, Jason, Kevin Reiss, and Angela Sidman. “Using Omeka to Build Digital Collections: The METRO Case Study.” D-Lib Magazine 16, no. 3/4 (March 2010). doi:10.1045/march2010-kucsma.

Nahl, Diane. “A Discourse Analysis Technique for Charting the Flow of Micro-Information Behavior.” Journal of Documentation 63, no. 3 (2007): 323–39. doi:

Palmer, Carole L., and Melissa H. Cragin. “Scholarship and Disciplinary Practices.” Annual Review of Information Science and Technology 42, no. 1 (2008): 163–212. doi:10.1002/aris.2008.1440420112.

Palmer, Carole L. “Thematic Research Collections”, Chapter 24 in Schreibman, Susan, Ray Siemens, and John Unsworth. Companion to Digital Humanities (Blackwell Companions to Literature and Culture). Hardcover. Blackwell Companions to Literature and Culture. Oxford: Blackwell Publishing Professional, 2004.

Simon, Herbert. “Rational Choice and the Structure of the Environment.” Psychological Review 63, no. 2 (1956): 129–38.

Strehlow, Theodor George Henry. Journey to Horseshoe Bend. [Sydney]: Angus and Robertson, 1969.


  • Researchers:

    • Prof. Heather Goodall
    • Dr Michelle Voyer
    • Associate professor Carol Liston
    • Dr Jodi Frawley
    • Dr Kevin Davies

  • eResearch: Sharyn Wise, Peter Sefton, Mike Lynch, Paul Nguyen, Mike Lake, Carmi Cronje, Thom McIntyre and Kevin Davies, Kim Heckenberg, Andrew Leahy, Lloyd Harischandra

  • Library: Duncan Loxton (eScholarship) & Kendell Powell (Aboriginal & Torres Strait Islander Data Archive Officer), Katrina Trewin, Michael Gonzalez

  • Thanks to: State Library of NSW Indigenous Unit, Atlas of Living Australia, Terrestrial Ecosystems Research Network and our funder, ANDS.

I didn’t have this slide when I presented, and forgot to acknowledge the contribution of all of the above, and anyone who’s been left off by accident.

Implementing ORCID and other linked-data metadata in repositories

Implementing linked data metadata systems

Peter Sefton

University of Technology, Sydney

This is a short presentation for the ORCID (Open Researcher and Contributor ID) implementers Roundtable in Canberra April 14, 2015.

The ORCID site says:

ORCID provides a persistent digital identifier that distinguishes you from every other researcher and, through integration in key research workflows such as manuscript and grant submission, supports automated linkages between you and your professional activities ensuring that your work is recognized.

I will post this so I can present – and come back later to expand, and clean up typos, so this post will evolve a bit.

This event is launching a document:

The ‘Joint Statement of Principle: ORCID – Connecting Researchers and Research‘ [PDF 297KB] proposes that Australia’s research sector broadly embrace the use of ORCID (Open Researcher and Contributor ID) as a common researcher identifier. The statement was drafted by a small working group coordinated by the Australian National Data Service (ANDS) comprised of representatives from Universities Australia (UA), the Council of Australian University Librarians (CAUL) and the Australasian Research Management Society (ARMS). Representatives of the Australian Research Council and the National Health and Medical Research Council also provided input through the working group.

In this presentation I talk about some of the details of how to implement the ORCID. Just how do you use an ORCID ID in a institutional repository?

This is not that simple, as most of our systems are set up to expect string-values for names, not IDs.

This talk is not all about ORCIDs…

… it’s about implemeting linked data principles

This talk is about why ORCIDs are important, as part of the linked-data web. I will give examples of some of the work that’s going on at UTS and other institutions on linked-data approaches to research data management and research data publishing and conclude with some comments about the kinds of services I think ORCID needs to offer.

Modern metadata should be linked-data

  • Thou Shalt Have No Data Without Metadata

  • RDF is best practice for Metadata

  • Use Metadata Standards where they exist

  • Use URIs rather than Scalars (eg Strings) as names

  • Name all data and metadata ASAP

Clarke and Sefton 2014

But using URIs as names for thing is not easy to do. Most repository software doesn’t support this out of the box, and it’s difficult to graft URIs onto a lot of existing metadata schemas, such as the ubiquitous Dublin Core which in its simple, flat form has no way for URIs to be used as metadata values.

And while it’s easy enought to say “RDF is best practice for Metadata” entering RDF metadata is non-trivial for humans. So I wanted to show you some of the work we’ve been doing to make it possible to build research data systems that are compliant with the above principles.



Screenshot from the UTS Stash data catalogue showing a party lookup, to get a URI that identifies a person.

The ANDS-funded ReDBOX project embraced linked metadata principles from the very beginning, it has a name-authority service, Mint which is a clearing house for sources of truth about people, organisational units, subject codes, grant codes etc.

But, the ReBOX/Mint partnership is a very close one, there’s no general way to lookup other name authorities, without loading them into Mint.


In 2014 I asked, what if there were a general way to do this, so that we could use URIs from a wide range of sources, and a team of developers from NZ and the UK responded as part of the Developer Challenge Event at that year’s Open Repositories conference in Helsinki, supported by Rob Peters from ORCID, who is at this meeting in Canberra.

Slide by:

  • Adam Field : iSolutions, University of Southampton
  • Claire Knowles: Library Digital Development Manager, University of Edinburgh
  • Kim Shepherd: Digital Development, University of Auckland Library
  • Jared Watts: Digital Development, University of Auckland Library
  • Jiadi Yao: EPrints Services, University of Southampton

Enter Fill my List

Members of the Fill My List team (minus Claire Knowles who took the pic) hard at work at Open Repositories 2014

Members of the Fill My List team (minus Claire Knowles who took the pic) hard at work at Open Repositories 2014

See their git repo.

This modest github repository might not look like much, but as far as I know, it’s the first example of an attempt to create an easy-to use protocol for web developers to make lookup services.

Fill my list enabled auto-complete lookup to multiple sources of truth including ORCID, so a user can find the particular Lee or Smith they want to assert is a creator of a work, specify which kind of Python they mean for the subject of a work and get a URI. The FML team did prototype implementations for ePrints and Dspace software.

Screen Shot 2015-04-13 at 9.56.39 am

Looking up the Schools Online Thesaurus (ScOT) for the URI for “Billabongs”.



We have since picked FML up at UTS, and are using it to make high quality metadata for our Australian National Data Service funded Major Open Data Collections project.

The above screenshot shows a prototype lookup service which shows auto-complete hints as you type.

Note that typing “Oxb” find the same URI – Billabongs are also know as ‘oxbow lakes’.

Note that in the screenshot you can see one of the important changes we made to the Omeka repository software to support linked data, as part of the Ozmeka project.

Instead of just a string field for the subject there is a URI as well. So, even though some records might say “Billabongs” and some might say “Oxbow lakes” both would have the same URI.

Note that to make this work we had to hack the Omeka software we’re using because like most repository software it didn’t have good support for using URIs as metadata values.

So, why am I telling you all this?

Screen Shot 2015-04-13 at 10.29.28 am

The raw, machine-readable Fill My List protocol in action, looking up an ORCID index.

When we refer to researchers on the web, we should use their ORCID ID, in the form of a URI. But to be able to do this we often have to update repository software (as my tean at UTS are doing with Omeka).

In conclusion

The ORCID API (machine to machine interface) provides pretty good but not perfect open lookup services so…

Repository developers can make their repositories linked-data compliant

But it’s a lot of work and it will involve a community effort to update our repository systems, many of which are open source.

We have made a lot of progress on improving the quality of metadata in the Australian research sector – and a lot of that has been community driven, for example the ReBOX project’s insistence on using URIs led to the first URIs for Australian grant being minted by developers from Griffith and USQ, because it was the right thing to do.

Now, a few years later, the government is making its own URIs and the Australian National Data services is providing vocab services.

ORCID does have a public API to allow us to build Fill My List type lookup services – allowing to query on name-parts, it would be better if it included bibliographic information, wich might help someone entering metadata choose between two people with the same name.

A quick open letter to eResearch@UTS

2015-02-17, from a tent at the University of Melbourne

Hi team,

Thanks for a great first week last week and thanks for the lunch Peter Gale – I think I counted 12 of us around the table. I thought the week went well, and I actually got to help out with a couple of things, but you’ll all be carrying most of the load for a little while yet while I figure out where the toilets are, read through those delightful directives, policies and procedures that are listed in the induction pack, and try to catch up with all the work that’s already going on and the systems you have in place. All of you, be sure to let me know if there’s something I should be doing to start pulling my weight.

As you know, I have immediately nicked-off to Melbourne for a few days. Thought I might explain what that’s about.

I am at the Research Bazaar conference, #Resbaz.

What’s a resbaz?

The site says:

The Research Bazaar Conference (#ResBaz) aims to kick-start a training programme in Australia assuring the next generation of researchers are equipped with the digital skills and tools to make their research better.

This event builds on the successful Doctoral Training programmes by research councils in the UK [1] and funding agencies in the USA [2]. We are also looking to borrow the chillaxed vibe of events like the O’Reilly Science ‘Foo Events’ [3].

So what exactly is ResBaz?

ResBaz is two kinds of events in one:

  1. ResBaz is an academic training conference (i.e. think of this event as a giant Genius Bar at an Apple store), where research students and early career researchers can come to acquire the digital skills (e.g. computer programming, data analysis, etc.) that underpin modern research. Some of this training will be delivered in the ‘hands-on’ workshop style of Mozilla’s global ‘Software Carpentry’ bootcamps.

You can get hands-on support like at an Apple Store’s Genius Bar!

  1. ResBaz is a social event where researchers can come together to network, make new friends, and form collaborations. We’re even trying to provide a camping site on campus for those researchers who are coming on their own penny or just like camping (dorm rooms at one of the Colleges will be a backup)! We have some really fun activities planned around the event, from food trucks to tent BoFs and watching movies al fresco!

It’s also an ongoing research-training / eResearch rollout program at Melbourne Uni.

But what are you doing there Petie?

On Monday I did three main things apart from the usual conference networking, meeting people stuff.

Soaked up the atmosphere, observed how the thing is run, and talked to people about how to run eResearch training programs

David Flanders wants us to run a similar event in Sydney, I think that’s a good idea, he and I talked about how to get this kind of program funded internally and what resources you need to make it happen.

Arna from Swinburne told me about a Resbaz-like model at Berkeley where they use part-time postdocs to drive eResearch uptake. This is a bit different from the Melbourne uni approach of working with postgrads:

Attended the NLTK training session

This involves working through a series of text-processing exercises in an online Python shell, iPython. I’m really interested in this one, not just ‘cos of my extremely rusty PhD in something resembling computational linguistics, but because of the number of different researchers from different disciplines who will be able to use this for text-mining, text processing and text characterisation.

Jeff, can you please let the Intersect snap-deploy team know about DIT4C – which lets you create a kind of virtualised computer lab for workshops, and, I guess, for real work, via some Docker voodoo. (Jeff Christiansen, is the UTS eResearch Analyst, supplied by our eResearch partner Intersect).

Met with Shuttleworth Fellow Peter Murray-Rust and the head of Mozilla’s science lab Kaitlin Thaney

We wanted to talk about Scholarly HTML. How can we get scholarship to be of the web, in rich content-minable semantic markup rather than just barely-on the web. Even just simple things like linking authors names to their identifiers would be a huge improvement over the current identity guessing games we play with PDFs and low-quality bibliographic metadata.

Kaitlin asked PMR and me where we should start with this, where would the benefits be most apparent, and the the uptake most enthusiastic? It’s sad but the obvious benefits of HTML (like, say being able to read an article on a mobile phone) are not enough to change the scholarly publishing machine.

We’ve been working on this for a long time, and we know that getting mainstream publisher uptake is almost impossible – but we think it’s worth visiting the Open Educational Resources movement and looking at textbooks and course materials, where the audience want interactive eBooks, and rich materials (even if they’re packaged as apps, HTML is still the way to build them). There’s also a lot opportunity with NGO and university reports where impact and reach are important, and with the reproducible-research crowd who want to do things the right way.

I think there are some great opportunities for UTS in this space, as we have Australia’s biggest stable of Open Access journals, a great basis on which to explore new publishing models and delivery mechanisms.

I put an idea to Kaitlin which might result in a really useful new tool. She’s got the influence at Mozilla and can mobilise and army of coders. I hope there’s more to report on that.

Kaitlin also knows how to do flattery:


  • Need to talk to Deb Verhoeven from Deakin about the new Ozmeka project, an open collaboration to adapt the humanities-focussed Omeka respository software for working-data repositories for a variety of research disciplines. So far we have UWS and UTS contributing to the project, but we’d love other Australian and global collaborators.

  • Find out how to use NLTK to do named-entity recognition / semantic tagging on stuff like species and common-names for animals, specifically fish, for a project we have running at UTS.

    This project takes a thematic approach to building a data collection, selecting data from UTS research relating to water to build a ‘Data Hub of Australian Research into Marine and Aquatic Ecocultures’ (Dharmae). UTS produces a range of research involving water across multiple disciplines: concerning water as a resource, habitat, environment, or cultural and migratory medium. The concept of ‘ecocultures’ will guide collection development which acknowledges the interdependence of nature and culture, and recognises that a multi-disciplinary approach is required to produce transformational research. Rather than privilege a particular discipline or knowledge system (e.g. science, history, traditional indigenous knowledge, etc), Dharmae will be an open knowledge arena for research data from all disciplines, with the aim of supporting multi-disciplinary enquiry and provoking cross-disciplinary research questions.

    Dharmae will be seeded with two significant data collections, a large oral history project concerning the Murray Darling Basin, and social science research examining how NSW coastal residents value the coast. These collections will be linked to related external research data collections such as those on TERN, AODN, and, thanks to the generous participation of indigenous Australians in both studies, to the State Library of NSW indigenous data collections. Dharmae will continue to develop beyond the term of this project.

  • Make sure Steve from Melbourne meets people who can help him solve his RAM problem by showing him how to access the NeCTAR cloud and HPC services.

Letter of resignation

to: The boss

cc: Another senior person, HR

date: 2014-12-10

Dear <boss>,

As I discussed with you last week, I have accepted a position with UTS, starting Feb 9th 2015, and I resign my position with UWS. My last day will be Feb 6th 2015.

Regards, Peter

Dr PETER SEFTON Manager, eResearch, Office of the Deputy Vice-Chancellor (Research & Development) University of Western Sydney

Anticipated FAQ:

  • What? eResearch Support Manager – more or less the same gig as I’ve had at UWS, in a tech-focussed uni with a bigger team, dedicated visualisation service and HPC staff, an actual budget and mostly within a few city blocks.

  • Why UTS? A few reasons.

    • There was a job going, I thought I’d see if they liked me. They did. I already knew some of the eResearch team there. I’m confident we will be good together.

    • It’s a continuing position, rather than the five-year, more-than-half-over contract I was on, not that I’m planning to settle down for the rest of my working life as an eResearch manager or anything.

    • The physical concentration should be conducive to Research Bazaar #resbaz activities such as Hacky Hour.

  • But what about the travel!!!!? It will be 90 minutes laptop time each way on a comfy, reasonably cheap and fairly reliable train service with almost total mobile internet coverage, with a few hundred metres walking on either end. That’s a change from 35-90 minutes each way depending on what campus(es) I was heading for that day and the mode of transport, which unfortunately was mostly motor vehicle. I do not like adding yet another car to Sydney’s M4, M7 or M5, despite what I said in my UWS staff snapshot. I think I’ll be fine with the train. If not, oops. Anyway, there are inner-Sydney family members and mates I’ll see more of if only for lunch.

    Glen brook creek3

    When the internets stop working the view is at its best. OK, apart from the tunnel and the cuttings.

  • What’s the dirt on UWS? It’s not about UWS, I’ve had a great time there, learned how to be an eResearch manager, worked on some exciting projects, made some friends, and I’ll be leaving behind an established, experienced eResearch team to continue the work. I’m sorry to be going. I’d work there again.

  • Why did you use this mode of announcement? I was inspired by Titus Brown, a few weeks a go.

[updated 2015-01-07 – typo]

Trip report: Peter Sefton @ Open Repositories 2014, Helsinki, Finland

[Update: 2014-07-08, fixed a couple of typos since this is getting a bit of traffic]

Just self-archiving this post from the UWS eResearch blog here

Creative Commons License
Trip report: Peter Sefton @ Open Repositories 2014, Helsinki, Finland by Peter Sefton is licensed under a Creative Commons Attribution 4.0 International License.

From June 9th-13 th I attended the Open Repositories conference way up North in Helsinki. This year I was not only on the main committee for the conference, but was part of a new extension to the Program Committee, overseeing the Developer Challenge event, which has been part of the conference since OR2008 in Southampton . I think the dev challenge went reasonably well, but probably requires a re-think for future conferences, more on that below.

In this too-long-you-probably-won’t read post I’ll run through a few highlights around the conference theme, the keynote and the dev event.

Summary: For me the take-away was that now we have a repository ecosystem developing, and the OR catchment extends further and further beyond the library, sustainability is the big issue , and conversations around sustainability of research data repositories in particular are going to be key to the next few iterations of this conference. Sustainability might make a good theme or sub-theme. Related to sustainability is risk; how do we reduce the risk of the data equivalent of the serials crisis if there is such a crisis it won’t look the same, so how we will stop it?

View from the conference dinner


The keynote this time was excellent. Neuroscientist Erin McKiernan from Mexico gave an impassioned and informed view of the importance of Open Access: Culture change in academia: Making sharing the new norm (McKiernan, 2014). Working in Latin America McKiernan could talk first-hand about how the scholarly communications system we have now disadvantages all but the wealthiest countries.

There was a brief flurry of controversy on Twitter over a question I asked about the risks associated with commercially owned parts of the scholarly infrastructure and how we can manage those risks. I did state that I thought that Figshare was owned by McMillan’s Digital Science, but was corrected by Mark Hahnel; Digital Science is an investor, so I guess “it is one of the owners” rather than “owns”. Anyway, my question was misheard as something along the lines of “How can you love Figshare so much when you hate Nature and they’re owned by the same company”. That’s not what I meant to say, but before I try to make my point again in a more considered way, some context.

McKiernan had shown a slide like this:

My pledge to be open

  • I will not edit, review, or work for closed access journals.

  • I will blog my work and post preprints, when possible.

  • I will publish only in open access journals.

  • I will not publish in Cell, Nature, or Science.

  • I will pull my name off a paper if coauthors refuse to be open.

If I am going to ‘make it’ in science, it has to be on terms I can live with.

Good stuff! If everyone did this, the Scholarly Communications process would be forced to rationalize itself much more quickly than is currently happening and we could skip the endless debates about the “Green Road” and the “Gold Road” and the “Fools Gold Road”. It’s tragic we’re still debating using this weird colour-coded-speak twenty years in to the O A movement .

Anyway, note the mention of Nature .

What I was trying to ask was: How can we make sure that McKiernan doesn’t find herself, in twenty years time, with a slide that says:

“I will not put my data in Figshare”.

That is, how do we make sure we don’t make the same mistake we made with scholarly publishing? You know, where academics write and review articles, often give up copyright in the publishing process, and collectively we end up paying way over the odds for a toxic mixture of rental subscriptions and author-pays open-access, with some risk the publisher will ‘forget’ to make stuff open.

I don’t have any particular problem with Figshare as it is now, in fact I’m promoting its use at my University, and working with the team here on being able to post data to it from our Cr8it data publishing app . All I’m saying is that we must remain vigilant. The publishing industry has managed to transform itself under our noses from: much needed distribution service of tangible goods ; to rental service where we get access to The Literature pretty-much only if we keep paying ; to its new position as The custodian of The Literature for All Time , usurping libraries as the place we keep our stuff.

We need to make sure that the appealing free puppy offered by the friendly people at Figshare doesn’t grow into a vicious dog that mauls our children or eats up the research budget.

So, remember, Figshare is not just for Christmas.

Disclosure: After the keynote, I was invited to an excellent Thai dinner by the Figshare team, along with Erin and a couple of other conference-goers. Thanks for the salmon and the wine, Mark and the Figshare investors. I also snaffled a few T-Shirts from a later event ( Disruption In The Publishing Industry: Digital, Analytics & The Future ) to give to people back home.

Figshare founder and CEO Mark Hahnel (right) and product manager Chris George hanging out at the conference dinner

Conference Theme, leading to discussions about sustainability

The conference theme was Towards Repository Ecosystems .

Repository systems are but one part of the ecosystem in 21st century research, and it is increasingly clear that no single repository will serve as the sole resource for its community. How can repositories best be positioned to offer complementary services in a network that includes research data management systems, institutional and discipline repositories, publishers, and the open Web? When should service providers build to fill identified niches, and where should they connect with related services?  How might these networks offer services to support organizations that lack the resources to build their own, or researchers seeking to optimize their domain workflows?

Even if I say so myself, the presentation I delivered for the Alveo project (co-authored with others on the team) was highly theme-appropriate; it was all about researcher-needs driving the creation of a repository service as the hub of a Virtual Research Environment, where the repository part is important but it’s not the whole point .

I had trouble getting to see many papers, given the dev-wrangling, but there was definitely a lot of eco-system-ish work going on, as reported by Jon Dunn :

Many sessions addressed how digital repositories can fit into a larger ecosystem of research and digital information. A panel on ORCID implementation experiences showed how this technology could be used to tie publications and data in repositories to institutional identity and access management systems, researcher profiles, current research information systems, and dissertation submission workflows; similar discussions took place around DOIs and other identifiers. Other sessions addressed the role of institutional repositories beyond traditional research outputs to address needs in teaching and learning and administrative settings and issues of interoperability and aggregation among content in multiple repositories and other systems .

One session I did catch (and not just ‘cos I was chairing it) had a presentation by Adam Field and Patrick McSweeney on Micro data repositories: increasing the value of research on the web (Field and McSweeney, 2014). This has direct application to what we need to do in eResearch, Adam reported on their experience setting up bespoke repository systems for individual research projects, with a key ingredient missing in a lot of such systems; maintenance and support from central IT. We’re trying to do something similar at the University of Western Sydney, replicating the success of a working-data repository at one of our institutes ( reported at OR2013) across the rest of the university, I’ll talk more to Adam and Patrick about this.

For me the most important conversation at the conference was around sustainability. We are seeing more research-oriented repositories and Virtual Research Environments like Alveo, and it’s not always clear how these are to be maintained and sustained.

Way back, when OR was mainly about Institutional Publications Repositories (simply called Institutional Repositories, or IRs) we didn’t worry so much about this; the IR typically lived in The Library, the IR was full of documents and The Library already had a mission to keep documents. Therefore the Library can look after the IR. Simple.

But as we move into a world of data repository services there are new challenges:

  • Data collections are usually bigger than PDF files, many orders of magnitude bigger in fact, making it much more of an issue to say “we’ll commit to maintaining this ever-growing pile of data”:

  • “There’s no I in data repostory (sic)” – i.e. many data repositories are cross-institutional which means that there is no single institution to sustain a repository and collaboration agreements are needed. This is much, much more complicated that a single library saying “We’ll look after that”.

And as noted above, there are commercial entities like Figshare and Digital Science realizing that they can place themselves right in the centre of this new data-economy. I assume they’re thinking about how to make their paid services an indispensable part of doing research, in the way that journal subscriptions and citation metrics services are, never mind the conflict of interest inherent in the same organization running both.

Some libraries are stepping up and offering data services, for example, work between large US libraries.

The dinner venue

The developer challenge

This year we had a decent range of entries for the dev challenge, after a fair bit of tweeting and some friendly matchmaking by yours truly. This is the third time we’ve run the thing a clearly articulated set of values about what we’re trying to achieve .

All the entrants are listed here, with the winners noted in-line. I won’t repeat them all here, but wanted to comment on a couple.

The people’s choice winner was a collaboration between a person with an idea, Kara Van Malssen from AV Preserve in NY, and a developer from the University of Queensland, Cameron Green, to build a tool to check up on the (surprisingly) varied results given by video characterization software . This team personified the goals of the challenge, creating a new network, while scratching an itch, and impressing the conference-goers who gathered with beer and cider to watch the spectacle of ten five-minute pitches.

My personal favorite came from an idea that I pitched (see the ideas page ) was the Fill My List framework, which is a start on the idea of a ‘ Universal Linked Data metadata lookup/autocomplete ’. We’re actually picking up this code and using it at UWS. So while the goal of the challenge is not to get free software development for the organizers that happened in this case (yes, this conflict of interest was declared at the judging table). Again this was a cross-institutional team (some of whom had worked together and some of whom had not). It was nice that two of the participants, Claire Knowles of Edinburgh and Kim Shepard of Auckland Uni were able to attend a later event on my trip at a hackfest in Edinburgh . There’s a github page with links to demos.

But, there’s a problem. The challenge seems to be increasingly hard work to run, with fewer entries arising spontaneously at recent events. I talked this over with members of the committee and others. There seem to be a range of factors:

  • The conference may just be more interesting to a developer audience than it used to be. Earlier iterations had a lot more content in the main sessions about ‘what is a(n) (institutional) repository’ and ‘how do I promote my repository and recruit content’ whereas now we see quite detailed technical stuff more often.

  • Developers are often heavily involved in the pre-conference workshops leaving no time to attend a hack day to kick of the conference.

  • Travel budgets are tighter so if developers do end up being the ones sent they’re expected to pay attention and take notes.

I’m going to be a lot less involved in the OR committee etc next year, as I will be focusing on helping out with Digital Humanities 2015 at UWS. I’m looking forward to seeing what happens next in the evolution of the developer stream at the OR conference. At least it’s not a clash.

The Open Repositories Conference (OR2015) will take place in Indianapolis, Indiana, USA at the Hyatt Regency from June 8-11, 2015. The conference is being jointly hosted by Indiana University Libraries , University of Illinois Urbana-Champaign Library , and Virginia Tech University Libraries .

This pic got a few retweets


Field, A., and McSweeney, P. (2014). Micro data repositories: increasing the value of research on the web.

McKiernan, E. (2014). Culture change in academia: Making sharing the new norm.