[ptsefton.com] | [CV & Bio]

Did you say you own this data? You keep using that word. I do not think it means what you think it means.


In this post I question the use of the word ‘own’ in relation to research data. Is it misleading to talk about owning data? This came up as I was doing research into policies and procedures for research data management, in the context of projects funded by the Australian National Data Service, designed to promote data re-use and sharing. Feedback wanted!

(Disclaimer: I discuss here issues that research organizations have to work out at a policy level and I am certainly not going to attempt to do that here and now. This is my private blog. I am not a lawyer. I’m working at UWS with others on policy in this area, alongside the work we’re doing on a Research Data Repository but this is not UWS speaking.)

Richard Stallman famously urges us to reject the term Intellectual Property – ‘IP’ to its friends – on the grounds that it confuses and conflates several legal frameworks under one term: Did You Say “Intellectual Property”? It's a Seductive Mirage. People have an intuitive sense of what property is like, who has rights to do various things with it. Problem is, when it comes to rights in intangibles the intuitive view is usually wrong, and talking about ‘my IP’ gives a false sense of propriety, a sense that somehow the products of one’s intellect can be locked up like a house, or fitted with an immobiliser like a car. I tend to agree with Stallman that IP is not a usually a useful generalisation but I’ll use it here, because the point of this post is to discuss property rights in data.

Now, to the point; the concept of ownership of data. I’m sure I’ve talked about owning data, and I heard it a lot last week at an eResearch meeting about research data management in Sydney. Every time I hear it now I wonder, what does it really mean to own data? It is certainly not about who owns the disk drives or the USB message sticks or the paper on which the data are stored.  When we talk about ownership of something that falls under the banner ‘IP’, you need to specify which Intellectual Property right you own.

Think about this; if you wrote a book, would you talk in a casual setting about ‘owning’ it? No, at a party you’d say “I’ve written a novel”. Or with a research paper, you’d likely talk about having had it published, rather than owning it (not least because it’s likely you don’t own the copyright). There’s an inherent recognition of how copyright works functions in the way people refer to these things, ‘owning’ is usually ‘owning a copy’ not ‘owning the rights to’, but the way we talk about data doesn’t seem to be so nuanced; you do hear people talking about owning data rather than having collected it, or compiled it or being responsible for its preservation and upkeep.

I am not a lawyer, but as far as I can tell in Australia, there are two types of Intellectual Property[1] in data that you might own.

Should we  be talking about ‘owning’ data?

Should we talk about data creators, compilers, custodians, users and so on, but avoid the term ‘owners’ except when properly qualified? For example “Copyright in collections of data compiled by <insert-institution-here> employees is normally owned by the University”. Or, “All data are to be considered confidential and must not be shared except with an explicit contract specifying terms of use, or organized into collections and placed under an open licence”.

In conclusion, what Stallman says about the term Intellectual Property applies just as much to the word ‘own’:

The term [IP] carries a bias that is not hard to see: it suggests thinking about copyright, patents and trademarks by analogy with property rights for physical objects.

(Please comment below if you have a good definition of data ownership and/or you disagree.)

Postscript, what to tell researchers?

Curmudgeonly rants about terminology aside, what should researchers do?

In the context of the above disclaimer, my thoughts:

  • Don’t talk about ‘ownership’ of data without qualification as to which kind of ownership you mean – others may make completely wrong assumptions about what you mean. Remember that data collections are not subject to the same laws as physical property and thresholds for copyright differ from creative works like research papers.

  • Do consider how you would like to share, manage, preserve and re-use data to further the cause of research, make sure findings can be validated and to be cited as a data creator or compiler (There are lots of reasons to share. Lets assume those conversations are taking place as they should be).

  • Never share data with anyone without an explicit statement of what your expectations are for how it can or cannot be used, re-used cited and disseminated.

    The DCC guide is a good starting point for understanding the issues. And in Australia there are the Australian National Data Service (ANDS) guides.

    For openly available data in Australia the recommendation from ANDS is to use Creative Commons licenses, which are copyright-based licences (even though there is a degree of uncertainty around the extent of copyright in data). The CC licenses give you a way to express the terms under which you would like to share data[2].

    For confidential or commercially sensitive data that can’t be shared openly talk to your office of commercialisation about appropriate contractual arrangements for data sharing.

Copyright Peter Sefton. Licensed under Creative Commons Attribution-Share Alike 2.5 Australia. <http://creativecommons.org/licenses/by-sa/2.5/au/>


[1] See http://www.ipaustralia.gov.au/understanding-intellectual-property/what-is-ip/ip-protection/

Types of IP protection

IP rights give you the exclusive legal right to take advantage your IP and help you prevent others infringing it. There are different types of IP protection in Australia, each with its own legislation.

  • Patents - for new or improved products or processes.

  • Trade marks - for letters, words, phrases, sounds, smells, shapes, logos, pictures, aspects of packaging or a combination of these, to distinguish the goods and services of one trader from those of another.

  • Designs - for the shape or appearance of manufactured goods.

  • Plant breeder's rights - for new plant varieties.

  • Copyright - for original material in literary, artistic, dramatic or musical works, films, broadcasts, multimedia and computer programs.

  • Circuit layout rights - for the three-dimensional configuration of electronic circuits in integrated circuit products or layout designs.

  • Confidentiality/trade secrets - including know-how and other confidential or proprietary information.

[2] See  AusGOAL — Australian Governments Open Access and Licensing Framework which says:

Nothing prevents the AusGOAL suite of licences being applied to data.   The law with respect to the subsistence of copyright in factual information (for example, data) is currently being tested before the Courts.  The outcomes of  this litigation may cause further refinement of the application of Creative Commons licences. It is also important to note that not all data would fall into the spectrum being considered in the present litigation.  Until the litigation is resolved, the Creative Commons licences remain an effective tool for the licensing of data.

In any event, Creative Commons licences will continue to adequately serve the purpose of identifying attribution requirements of the creator or publisher, and provide other benefits such as a positive and prominent notice of terms and conditions of use of the information to which they are applied.  The AusGOAL Framework will be updated and or modified to address the outcomes of the current legal proceedings when they come to an end.