[ptsefton.com] | [CV & Bio]

Research Object Crate (RO-Crate) Update

2021-06-11

Research Object Crate (RO-Crate) Update
Peter Sefton & Stian Soiland-Reyes

This presentation by Peter Sefton and Stian Soiland-Reyes was presented by Peter Sefton at the Open Repositories 2021 conference on 2021-06-10 (in Australia). RO-Crate has been presented at Open Repositories several times, including a workshop in 2019, so we wonโ€™t go through a very detailed introduction but we WILL start with with a quick introduction for those who have not seen it before.

โ˜๏ธ
๐Ÿ“‚
<p>๐Ÿ“„
ID? Title? Description?</p>
<p>๐Ÿ‘ฉโ€๐Ÿ”ฌ๐Ÿ‘จ๐Ÿฟโ€๐Ÿ”ฌWho created this data?
๐Ÿ“„What parts does it have?
๐Ÿ“… When?
๐Ÿ—’๏ธ What is it about?
โ™ป๏ธ How can it be reused?
๐Ÿ—๏ธ As part of which project?
๐Ÿ’ฐ Who funded it?
โš’๏ธ How was it made?
Addressable resources
Local Data
๐Ÿ‘ฉ๐Ÿฟโ€๐Ÿ”ฌ https://orcid.org/0000-0001-2345-6789
๐Ÿ”ฌ https://en.wikipedia.org/wiki/Scanning_electron_microscope

RO-Crate is method for describing a dataset as a digital object using a single linked-data metadata document which can have descriptions of files and resources that are local or remote, and can contain discipline-appropriate context for the data.

๐Ÿ“‚
<p>๐Ÿ”ฌ ๐Ÿ”ญ ๐Ÿ“น ๐Ÿ’ฝ ๐Ÿ–ฅ๏ธ โš™๏ธ๐ŸŽผ๐ŸŒก๏ธ๐Ÿ”ฎ๐ŸŽ™๏ธ๐Ÿ”๐ŸŒ๐Ÿ“ก๐Ÿ’‰๐Ÿฅ๐Ÿ’Š๐ŸŒช๏ธ

The dataset may contain any kind of data resource about anything, in any format as a file or URL

๐Ÿ“‚
<p>|-- Folder1/
|          |-- file1.this
|          |-- file2.that
|-- Folder2/
|		   -- file1.this
|          |-- file2.that
|-2021-04-08 07.58.17.jpg
{
"@id": "2021-04-08 07.58.17.jpg",
"@type": "File",
"contentSize": 3271409,
"dateModified": "2021-04-08T07:58:17+10:00",
"description": "",
"encodingFormat": [
{
"@id":  "https://www.nationalarchives.gov.uk/PRONOM/x-fmt/391"
},
"image/jpeg"
],
"name": "Cute puppy"
},</p>
<p>

Each resource can have a machine readable description in JSON-LD format

๐Ÿ“‚
<p>|-- Folder1/
|          |-- file1.this
|          |-- file2.that
|-- Folder2/
|		     |-- file1.this
|          |-- file2.that
|-2021-04-08 07.58.17.jpg</p>
<p>

A human-readable description and preview can be in an HTML file that lives alongside the metadata

What does this mean for repositories? It means that a repository can show the contents of a digital object using either a standard display library, or a customised one.

โ™ป๏ธ
<p>๐Ÿ“‚
๐Ÿ“ˆChart1</p>
<p>๐Ÿญ CreateAction
Date: 2021-04-01
โš™๏ธSoftware / workflow
Name: My Workflow
URL: https://example.com/workflow/1235
๐Ÿ”ฌinstrument</p>
<p>๐Ÿฅresult
๐Ÿ‘ฉ๐Ÿฝโ€๐Ÿ”ฌAgent

Provenance and workflow information can be included - to assist in data and research-process re-use.

What does this mean for repositories? Repositories will be able to launch software environments; if the digital object can be run in an emulator, or a notebook environment then there is potential to launch that.


 ๐ŸŽ๐Ÿ—œ๏ธ
๐Ÿ“ฎ๐Ÿšš
<p>

RO-Crate Digital Objects may be packaged for distribution eg via Zip, Bagit and OCFL Objects.


v1.1

Since last Open Repositories we have reached V1.1. The main change are tidying up the file extension and making it clear that RO-Crates are not just packages of files - they are aggregations of local and remote objects, weโ€™ll cover some other changes as well in the rest of the talk.


<p>

RO-Crate Tools keep coming.


<p>

RO-Crate is being adopted in a number of projects


<p>

And RO-Crate is a foundation standard of the Arkisto platform - which was covered in the presentation before this one.


<p>

The RO-Crate team is now working on profiles - these will be guidance for humans and validation tools who want to use RO-Crate for specific purposes.

Image by Bryan Derksen - Original image Cup or faces paradox.jpg uploaded by Guam on 28 July 2005, SVG conversion by Bryan Derksen, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=1733355

Machine and human readable, search engine friendly and developer familiar.
FAIR Object middleware
http://www.researchobject.org/ro-crate/
Standard Web Native PIDs + JSON-LD + Schema.org, off the shelf archiving formats
<p>Self-describing Typed by profiles + add more schema.org and domain ontologies</p>
<p>Extensible, descriptive and content openedness, honouring legacy, diversity, and known and unknown unknowns - one size does not fit all.
A valid RO-Crate JSON-LD graph MUST describe:
The RO-Crate Metadata File Descriptor
The Root Data Entity
Zero or more Data Entities
Zero or more Contextual Entities

We are working on aligning RO-Crate with the work going on internationally on FAIR Digital Objects - coming from the standpoint of having a working FAIR-inspired way to create digital objects already.


<p>

From a forthcoming paper by Soiland-Reyes et al

<WorkflowHub.eu> is a European cross-domain registry of computational workflows, supported by European Open Science Cloud projects, e.g. EOSC-Life, and research infrastructures including the pan-European bioinformatics network ELIXIR. As part of promoting workflows as reusable tools, WorkflowHub includes documentation and high-level rendering of the workflow structure independent of its native workflow definition format. The rationale is that a domain scientist can browse all relevant workflows for their domain, before narrowing down their workflow engine requirements. As such, the WorkflowHub is intended largely as a registry of workflows already deposited in repositories specific to particular workflow languages and domains, such as UseGalaxy.euand Nextflow nf-core .


<p>

RO-Crate is featuring in discussions with Dataverse as a way of packing data.


<p>

RO-Crate is going to be integrated with Zenodo as part of the CS3MESH4EOSC project, and by extension presumably the Invenio digital library framework. Of course RO-Crates can be deposited as they are can be wrapped as Zip files.

Other discussions / work going on
Ecological data description (via University of Queensland)
Machine-actionable Research Data Management Plans - eg mapping to RO-Crate
BioExcel - discussions are taking place
Via Australian Research Data Commons:
Australian Text Analytics Platform - data object description for Jupyter notebooks and other workspaces
Language Data Commons - potential building on techniques used in PARADISEC
BioCompute Objects (BCO) community-led effort to standardise submissions of computational workflows to biomedical regulators.
And IBISBA, ELIXIR, the EOSC-Life Cluster project, the DISSCo Synthesis+ SDR pipelines and the EOSC Reliance project in geosciences
A major Japanese institute (via Paul Walk)

There are now enough things happening with RO-Crate that it is getting hard to keep track of it all - this slide is an incomplete view of whatโ€™s happening now.


<p>

RO-Crate is an open group - anyone can sign up - we have meetings twice a month that alternate between the European Morning and late evening / Australian late afternoon / Early morning.