Meet RO-Crate

By Peter Sefton

This presentation was given by Peter Sefton at the eResearch Australasia 2019 Conference in Brisbane, on the 24th of October 2019.

This presentation is part of a series of talks delivered here at eResearch Australasia - so it won’t go back over all of the detail already covered - see the introduction of datacrate in 2017 and and the 2018 update. The standard formerly known as DataCrate has been subsumed into a new standard called Research Object Crate - RO-Crate for short.

<p>Eoghan Ó Carragáin https://orcid.org/0000-0001-8131-2150 (chair)
Peter Sefton https://orcid.org/0000-0002-3545-944X (co-chair)
Stian Soiland-Reyes https://orcid.org/0000-0001-9842-9718 (co-chair)
Oscar Corcho https://orcid.org/0000-0002-9260-0753
Daniel Garijo https://orcid.org/0000-0003-0454-7145
Raul Palma https://orcid.org/0000-0003-4289-4922
Frederik Coppens https://orcid.org/0000-0001-6565-5145
Carole Goble https://orcid.org/0000-0003-1219-2137
José María Fernández https://orcid.org/0000-0002-4806-5140
Kyle Chard https://orcid.org/0000-0002-7370-4805
Jose Manuel Gomez-Perez https://orcid.org/0000-0002-5491-6431
Michael R Crusoe https://orcid.org/0000-0002-2961-9670
Ignacio Eguinoa https://orcid.org/0000-0002-6190-122X
Nick Juty https://orcid.org/0000-0002-2036-8350
Kristi Holmes https://orcid.org/0000-0001-8420-5254
Jason A. Clark https://orcid.org/0000-0002-3588-6257
Salvador Capella-Gutierrez https://orcid.org/0000-0002-0309-604X
Alasdair J. G. Gray https://orcid.org/0000-0002-5711-4872
Stuart Owen https://orcid.org/0000-0003-2130-0865
Alan R Williams https://orcid.org/0000-0003-3156-2105

This is a recent snapshot of the makeup of the current RO-Crate team- compiled by Stian.

What is RO-Crate?
<p>RO-Crate is a community effort to establish a lightweight approach to packaging research data with their metadata. It is based on schema.org annotations in JSON-LD, and aims to make best-practice in formal metadata description accessible and practical for use in a wider variety of situations, from an individual researcher working with a folder of data, to large data-intensive computational research environments.</p>
<p>

The website says: RO-Crate is a community effort to establish a lightweight approach to packaging research data with their metadata. It is based on schema.org annotations in JSON-LD, and aims to make best-practice in formal metadata description accessible and practical for use in a wider variety of situations, from an individual researcher working with a folder of data, to large data-intensive computational research environments.

2017-06-16 Cry for help! Cameron Neylon: As a researcher...
2017-07-02 Research Data Crate started
2017-10-12 DataCrate 0.1
2018-03-22 DataCrate 0.2
2018-03-22 RDA BoF: Approaches to Research Data Packaging
2018-08-06 DataCrate 0.3
2018-09-11 Calcyte 0.3.0
2018-09-27 DataCrate 1.0
2018-10-02 npm install calcyte@1.0.0
2018-10-29 Workshop on Research Object RO2018
2019-02-13 RO Lite 0.1
2019-03-28 First RO-Lite community call
2019-05-02 RO-Crate use case gathering
2019-05-30 Google Docs-mode
2019-06-07 Open Repositories workshop: Research Data Packaging
2019-08-23 npm install calcyte@1.0.6
2019-09-24 Workshop on Research Object RO2019
2019-09-12 RO-Crate 0.2
2019-11-?? RO-Crate 1.0

This is a timeline for the merging of the Research Object packaging work with DataCrate - again compiled by Stian. While our DataCrate work was driven by practical concerns and a desire to describe research data with high-quality metadata Research Object shared those concerns but with more of a focus on reproducibility and detailed provenance for research data.

This is what an RO-Crate looks like if you open the HTML file that’s in the root directory (or you see one on the web).

This is the home page for RO-Crate.

Where did RO-Crate come from? RO-Crate is the marriage of Research Objects with DataCrate. It aims to build on their respective strengths, but also to draw on lessons learned from those projects and similar research data packaging efforts. For more details, see background.

👨‍⚕️ Man Health Worker 👩‍⚕️ Woman Health Worker 👨‍🎓 Man Student 👩‍🎓 Woman Student 👨‍🏫 Man Teacher 👩‍🏫 Woman Teacher 👨‍⚖️ Man Judge 👩‍⚖️ Woman Judge 👨‍🌾 Man Farmer 👩‍🌾 Woman Farmer 👨‍🍳 Man Cook 👩‍🍳 Woman Cook 👨‍🔧 Man Mechanic 👩‍🔧 Woman Mechanic 👨‍🏭 Man Factory Worker 👩‍🏭 Woman Factory Worker 👨‍💼 Man Office Worker 👩‍💼 Woman Office Worker 👨‍🔬 Man Scientist 👩‍🔬 Woman Scientist 👨‍💻 Man Technologist 👩‍💻 Woman Technologist 👨‍🎤 Man Singer 👩‍🎤 Woman Singer 👨‍🎨 Man Artist 👩‍🎨 Woman Artist 👨‍✈️ Man Pilot 👩‍✈️ Woman Pilot 👨‍🚀 Man Astronaut 👩‍🚀 Woman Astronaut 👨‍🚒 Man Firefighter 👩‍🚒 Woman Firefighter 👮 Police Officer 👮‍♂️ Man Police Officer 👮‍♀️ Woman Police Officer 🕵 Detective 🕵️‍♂️ Man Detective 🕵️‍♀️ Woman Detective 💂 Guard 💂‍♂️ Man Guard 💂‍♀️ Woman Guard 👷 Construction Worker 👷‍♂️ Man Construction Worker 👷‍♀️ Woman Construction Worker 🤴 Prince 👸 Princess 👳 Person Wearing Turban 👳‍♂️ Man Wearing Turban 👳‍♀️ Woman Wearing Turban 👲 Man With Skullcap 🧕 Woman With Headscarf 🤵 Man in Tuxedo 👰 Bride With Veil 🤰 Pregnant Woman 🤱 Breast-Feeding 👼 Baby Angel 🎅 Santa Claus 🤶 Mrs. Claus

Who is it for?

The RO-Crate effort brings together practitioners from very different backgrounds, and with different motivations and use-cases. Among our core target users are: a) research engaged with computation and data-intensive, wokflow-driven analysis; b) digital repository managers and infrastructure providers; c) individual researchers looking for a straight-forward tool or how-to guide to “FAIRify” their data; d) data stewards supporting research projects in creating and curating datasets.

RO-Crate is a collaboration between people all over the world, but the Editors are from Cork, Manchester and Katoomba Version one of the standard will be out in by Summer. But which summer? Standard reference points are important. Standards are important.

Which brings us the benefits of Standards. Without this standardised date format chaos would reign. What if that date had been written 05/08 or 08/05 - someone might end up eating food from May in August, or worse, eating last August’s food in May.

Anyway, If you find a partner who’ll adopt the ISO 8601 data standard then ...

… you should marry them.

Like how we married the Research Object and DataCrate - we bonded over standardisation.

Let’s explore standards a bit more. Iif you see this in metadata - what does it mean?

Is it a name given to the resource? URI: https://www.dublincore.org/specifications/dublin-core/dcmi-terms/terms/title/

An honorific like Ms, or Dr? As it would be in the FOAF ontology.

Or a very specific meaning relating to job titles? As in Schema.org.

In RO-Crate - there’s an HTML page which ships with each dataset that allows you to browse the object in as much detail as the author described it and we are careful to avoid ambiguity by adding help links to each metadata term so you see the definition.

Just wanted to shout out to ResearchGraph - led by Amir Aryani at Swinburne Uni - they are also using schema.org.

RO-Crates ship with two files, a human readable one and a machine readable JSON file. The two views (human and machine) of the data are equivalent - in fact the HTML version is generated from the JSON-LD version, via the DataCrate nodejs library.

And here’s an automatically generated diagram extracted from the sample DataCrate showing how two images were created. The first result was an image file taken by me (as an agent) using two instruments (my camera and lens), of a place (the object: Catalina park in Katoomba). A sepia toned version was the result of a CreateAction, with the instrument this time being the ImageMagick software. The DataCrate also contains information about that CreateAction such as the command used to do the conversion and the version of the software-as-instrument.

convert -sepia-tone 80% test_data/sample/pics/2017-06-11\ 12.56.14.jpg test_data/sample/pics/sepia_fence.jpg

This way of representing file provenance is Action-centred - the focus is on the action that creates a file, rather than the more usual metadata approach of having the file at the centre with properties for “Author” and the like. The action-based approach is MUCH more flexible as it can model the contribution of multiple agents and instruments separately at the expense of being somewhat counter-intuitive to those of us who are used to a library-card approach to metadata where the work is at the centre and has simple properties.

There was a question after this presentation about whether I had the arrows in this diagram pointing in the right direction. Yes, I do! The convention here is the standard way of representing a subject-predicate-object semantic triple with the subject as the source of the arrow, the predicate (in this case Schem.org property) as a label, and the pointy end pointing at the object.

What’s new / developing at the moment in the RO-Crate world? I will illustrate by looking at recent activity on our Github project.

We’re working on ways to describe not just files, but the CONTENTS of files - using properties like variableMeasured.

We have a way to describe a workflow

and actions that can be performed on data such as firing up a computational environment to re-run the workflow.

You too can add Use Cases like this one about software containers.

Breakig news: In the last couple of months Marco La Rosa, an independent developer working for PARADISEC, has ported 10,000 data and collection items into RO-Crate format, AND built a portal which can display them. This means that ANY repository with a similar structure Items in Collections could easily re-use the code and the viewers for various file types.

<p>http://45.113.232.73/paradisec.org.au/NT1/98007

This shows an intralinear transcription where you can play various segments of a recording and see the transcription.

The .eaf files in the previous example are produced using ELAN software. Marco has done the groundwork for a system that could work across multiple repositories and for stand-alone RO-Crates - the crate metadata describes the files, and what format they’re in, and the viewer which is an HTML page either served by a repository or possibly just off your hard disk, can use that information to load an appropriate viewer.

RO-Crate will be released in version 1 in November 2019 - we were aiming for October, but missed that.

We will publish the parts that are well-tested and stable, and immediately start on a new version with bleeding-edge cases.

We want input from potential users, current and prospective implementers and help drafting new parts of the spec is welcome.

You can join the team

    <a rel="license" href="http://creativecommons.org/licenses/by/3.0/au/"><img alt="Creative Commons Licence" style="border-width:0" src="https://i.creativecommons.org/l/by/3.0/au/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by/3.0/au/">Creative Commons Attribution 3.0 Australia License</a>.

Who is it for?

Comments