End-to-End Research Data Management for the Responsible Conduct of Research at the University of Technology Sydney
2018-07-04
This presentation was written by Louise Wheeler, Sharyn Wise and me for the Asia Pacific Research Integrity 2018 meeting in Taiwan, Feb 2018 it was scripted and delivered by Louise, who is the UTS Manager, Research Integrity and Research Program and Sharyn works in the eResearch team with me. This is good introduction to the work we've been doing on the UTS provisioner project from Louise's Research Integrity (RI) perspective. There's not much technical detail in this talk about the open source ReDBox platform on which our data management system, Stash, is built. I'll post more soon about that.
Thanks also to Chris Evenhuis for some slide design.
I'm posting this both on the eResearch blog and at my site.
Notes
Today I’ll be presenting the UTS approach to Research Data Management (RDM), and highlighting how the development of a practical and comprehensive solution can enable researchers to meet their RI obligations
I’ll run through a brief history that lead to our approach, and then I will present an overview of the solution we are implementing.
Formal RI training may occur early in a researchers career, but otherwise becomes assumed knowledge. Awareness of responsible research practices is not often actively supported or promoted throughout an academic's career, at least not consistently at the institutional level. At UTS we have pockets of opportunities for promoting research integrity principles, and one such pocket is the eResearch team – who have regular face-time with researchers when supporting them with tools and practices. We are using these opportunities to build a comprehensive framework that will enable researchers to meet RI principles, through improved systems and practices.
Notes
Australia’s current code has been in place since 2007. It is a relatively prescriptive document that guides institutions and researchers as to their responsibilities in meeting integrity principles, including management of research data.
Despite these guidelines, at that time there was no defined or coordinated approach to research data management at UTS. Any management relied on the best endeavours of individual researchers to protect their data. They continued to store their data on file-servers; in drawers, in unlabelled hard drives, in their garages etc.
Notes
It was not until 2013 that national infrastructure funding allowed Australian Universities to develop Research Data catalogues. For the first time researchers were encouraged to think of their data as an asset to be managed, as opposed to simply kept. At the same time, national requirements to include RDM plans in proposals to our major funding bodies, gave universities a bigger driver for changing culture.
At UTS we began to govern research data at the Policy level, which required researchers to plan their data management, At the systems level, because of the new focus on RDM requirements, we had the necessary senior-level investment to develop our research data catalogue (which we call Stash).
Take up of Stash over the next two years was very slow. With minimal resources, the strategy for promoting Stash focused on educating graduate research students and thereby indirectly, their supervisors. It was difficult to persuade researchers that sharing their data was a good idea, and that storage should be on university owned and controlled systems.
And, since the ANDS funding rules didn’t allow for development of data repositories, our catalogues weren’t integrated with our storage solutions. It was clear at that point that researchers needed an integrated data management solution.
Notes
In 2016 efforts around RI and RDM began to align. We rolled out a new Research Integrity Framework while the eResearch team produced a Strategy and roadmap that explicitly linked data management to Integrity requirements and during 2017, our research policy environment was redrafted in the same vein.
Notes
Collectively, these efforts enabled us to put the pieces in place for an end-to-end data management solution and the Provisioner project was born.
Notes
This is Provisioner. It looks complex, but I will walk you through it using a couple of examples that demonstrate how it supports both RDM and research integrity.
Notes
A researcher (top centre) accesses the data management catalogue “Stash”, which has three parts: (1) Create a data management plan (as required by our policy environment), to describe how their data will be collected, analysed, stored and accessed. (2) Access research data catalogue listing where archived data sets can be found
Notes
And from the plan, they (will by the end of 2018) have access to (3) the innovative provisioning tool which allows researchers to provision workspaces such as file-shares, electronic notebooks, and repositories for programming code,
Notes
They can request and link to data storage, and at the end of the project archived and publish their datasets.
Notes
The Provisioner will also have a range of reporting tools and automated ‘bots’ than generate useful reports for the Research Integrity team – identifying ‘orphaned’ workspaces, or auto-creating data management plans for researcher or research students.
Information is pulled in from our existing databases to populate the RDMPs, again saving researchers time.
Notes
We had a recent example in a science lab where some code from one project was modified for use in a new project. Most researchers are not aware that code is considered research data that must be retained, particularly in the sciences where they are not trained in IT. So no copies of the code were retained, and that data is now lost. In the Provisioner model, Gitlab solves this problem without the researcher having to stop and think about their research integrity obligations by saving each version the code and creating a revision history.
By introducing Provisioner, we are not implying that researchers can remain blissfully unaware of their responsibilities, but we are supporting them by providing a positive and streamlined experience, with the added comfort of understanding that it comprehensively addresses responsible research practices. We will continue to raise awareness by linking integrity training to our practical training around data mgmt, and other element of the project lifecycle.
We’ve found that bringing researchers into contact with our eResearch experts is an effective way of supporting and building an integrity culture. Our researchers tend to prefer practice-based over principles-based education: so we are aiming to build the principles into our workshops/clinics (e.g. Research ethics practices, Research data management practices, Publications workshops that focus on open access and reproducibility
Notes
Another example of how Provisioner addresses RCR is the incorporation of the LabArchives/ELN (Electronic Lab Notebook)(top left). As this model is being implemented uni-wide, it gives us an institutional advantage of oversight. eNotebooks developed at the lab or faculty level leads to inconsistent practices and means that data is out of the universities control.
Our policy dictates that rights in data are owned by the institution, which means we should have some control over how it is stored, accessed and reported, and also have the ability to interrogate the system, should any disputes arise. The LabArchives tool ensures that data can not be manipulated or lost, can help to clarify IP and ownership issues, and also enables us to manage data access when researchers move institutions.
Notes
And finally, for researchers doing real-live imaging experiments, they could generate terabytes of data in one session, so Provisioner can provide access to large, online repositories such as Omero. This gives researchers comfort that all their experimental data can be kept, not just selected files.
Notes
Provisioner is still being developed throughout 2018. What we’ve learned so far is that:
-
Simply mandating new RDM requirements doesn’t work. Researchers are incentivised by integrated and discipline-relevant tools that save them time and give them reason to uphold good data management practices for more than just policy compliance.
-
Successful implementation requires stakeholder engagement and championship at senior levels - we are fortunate that our Associate Dean, Research (ADR) in Health has invested heavily in both our RI and RDM responsibilities and has implemented a cultural change in that Faculty, meaning that every project now has an RDMP. This is reflected in uptake of RDMPs across the institution in 2017, supported by the extensive cross-promotion of between Ethics, RI/RCR and RDM representatives at UTS.
Notes
Early feedback has been positive with researchers keen to take up the components we have rolled out because they can see immediate value in adopting them, and we can already begin to see the effectiveness of the model by the levels of uptake of our LabArchives and GitLab since their implementation in 2017.
However, there is still a way to go. Data archiving and preservation to cater for longer retention periods such as those demanded by clinical trials are still ahead on the eResearch roadmap. Data management practice is improving but is by no means universal. We accept that culture change concerning RDM is a long term proposition; but it can also be a quick win for RCR by offering an opportunity to promote research integrity in practical ways. Meanwhile, we take a risk management approach, focusing on sensitive and personal data and grants where the funding bodies require RCR and research data management.