[ptsefton.com] | [CV & Bio]

Principles for eResearch Systems Development and Selection

2017-03-09

This was originally posted on 2014-08-08 at the eResearch Blog at the (then) University of Western Sydney where I led the eResearch team. These were the principles we used to inform our work on eResearch systems, as written up by the wry, sardonic David Clarke and me.

I'll see what the folks at UTS think...

This blog post is a first attempt at a set of principles and best practices that we should strongly encourage for eResearch.

Summary manifesto

... they’re all data, and hence, should not be without metadata.

No Data Without Metadata

Metadata tells you what the corresponding data actually is, without it we do not know what the data really means. We should capture metadata as soon as practical, preferably with the data to which it applies.

Using URIs for subjects, predicates and (where applicable) values give us precision and clarity. The semantics of ontologies are well defined, and the ability to refer to data objects via a globally-unique, completely unambiguous reference will support the reuse of that data – one of the main pillars of eResearch. In general, Tim Berner’s-Lees’ Five Stars of Linked Open Data are relevant, but note that not all research data can or should be made available as open data, although linked data is better than non-linked data.

While we acknowledge that science data formats and instrument makers have their own metadata formats, as do the library community and agencies such as The Australian National Data Service, which may not be RDF or Linked Data ready, we should use RDF and/or URIs as identifiers wherever possbile. This includes storing metadata as RDF in our repositories. The abilities this give us to link data and to search the metadata are too powerful to give up.

Separation Of Concerns

We strongly suspect that finding one single system which can do all things for all researchers is not going to happen. Instead, we believe that we should look to building ecosystems of collaborating systems, talking to each other over (preferably) standard APIs with each system doing specific tasks and doing them well.

Exposing services via well defined APIs gives several benefits:

Data Are Everything

Data includes inputs, results, physical specimens

Metadata includes information about all the context in which research is conducted where, what machines, which chemicals, which edition of the book, the temperature of the apartus, anything that might influence the results.

At the core of eResearch practice is keeping data safe (remember: No Data Without Metadata). Different classes of data are safest in different homes, but ideally each data set or item should live in a repository, where:

Types of repositories include:

Yes, we should add references to this document.

Creative Commons License: Principles for eResearch Systems Development and Selection by David Clarke & Peter Sefton is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.


  1. Yes, pedants, we know: Data Are Everything ↩︎

  2. This one we are sure about ↩︎