Did you say you “own” this data? You keep using that word. I do not think it means what you think it means.

In this post I question the use of the word ‘own’ in relation to research data. Is it misleading to talk about owning data? This came up as I was doing research into policies and procedures for research data management, in the context of projects funded by the Australian National Data Service, designed to promote data re-use and sharing. Feedback wanted!

(Disclaimer: I discuss here issues that research organizations have to work out at a policy level and I am certainly not going to attempt to do that here and now. This is my private blog. I am not a lawyer. I’m working at UWS with others on policy in this area, alongside the work we’re doing on a Research Data Repository but this is not UWS speaking.)

Richard Stallman famously urges us to reject the term Intellectual Property – ‘IP’ to its friends – on the grounds that it confuses and conflates several legal frameworks under one term: Did You Say “Intellectual Property”? It’s a Seductive Mirage. People have an intuitive sense of what property is like, who has rights to do various things with it. Problem is, when it comes to rights in intangibles the intuitive view is usually wrong, and talking about ‘my IP’ gives a false sense of propriety, a sense that somehow the products of one’s intellect can be locked up like a house, or fitted with an immobiliser like a car. I tend to agree with Stallman that IP is not a usually a useful generalisation but I’ll use it here, because the point of this post is to discuss property rights in data.

Now, to the point; the concept of ownership of data. I’m sure I’ve talked about owning data, and I heard it a lot last week at an eResearch meeting about research data management in Sydney. Every time I hear it now I wonder, what does it really mean to own data? It is certainly not about who owns the disk drives or the USB message sticks or the paper on which the data are stored.  When we talk about ownership of something that falls under the banner ‘IP’, you need to specify which Intellectual Property right you own.

Think about this; if you wrote a book, would you talk in a casual setting about ‘owning’ it? No, at a party you’d say “I’ve written a novel”. Or with a research paper, you’d likely talk about having had it published, rather than owning it (not least because it’s likely you don’t own the copyright). There’s an inherent recognition of how copyright works functions in the way people refer to these things, ‘owning’ is usually ‘owning a copy’ not ‘owning the rights to’, but the way we talk about data doesn’t seem to be so nuanced; you do hear people talking about owning data rather than having collected it, or compiled it or being responsible for its preservation and upkeep.

I am not a lawyer, but as far as I can tell in Australia, there are two types of Intellectual Property[1] in data that you might own.

  • You can talk about ownership of copyright in a data set that has had sufficient creativity put into its compilation to make it a creative work (and no, nobody can tell you for sure what qualifies). Many advocates of open access research reject the use of copyright as a way of controlling research data (for example this group), but note that in Australia official advice is to use copyright licenses (see below).

  • Or there’s confidential information, or trade secrets. That is data that you take reasonable measures to protect, and restrict access by others by contracts limiting their rights. I have not seen much discussion of this aspect of IP law in the eResearch area but it would see reasonable to this non-expert non-lawyer that if you keep data private, lock your office and don’t publish it on the internet then you would be able to complain if someone took a copy (or the only copy, even) of some data. If you put it on the Internet without a license or waiver, less so.

Should we  be talking about ‘owning’ data?

Should we talk about data creators, compilers, custodians, users and so on, but avoid the term ‘owners’ except when properly qualified? For example “Copyright in collections of data compiled by <insert-institution-here> employees is normally owned by the University”. Or, “All data are to be considered confidential and must not be shared except with an explicit contract specifying terms of use, or organized into collections and placed under an open licence”.

In conclusion, what Stallman says about the term Intellectual Property applies just as much to the word ‘own’:

The term [IP] carries a bias that is not hard to see: it suggests thinking about copyright, patents and trademarks by analogy with property rights for physical objects.

(Please comment below if you have a good definition of data ownership and/or you disagree.)

Postscript, what to tell researchers?

Curmudgeonly rants about terminology aside, what should researchers do?

In the context of the above disclaimer, my thoughts:

  • Don’t talk about ‘ownership’ of data without qualification as to which kind of ownership you mean – others may make completely wrong assumptions about what you mean. Remember that data collecitons are not subject to the same laws as physical property and thresholds for copyright differ from creative works like research papers.

  • Do consider how you would like to share, manage, preserve and re-use data to further the cause of research, make sure findings can be validated and to be cited as a data creator or compiler (There are lots of reasons to share. Lets assume those conversations are taking place as they should be).

  • Never share data with anyone without an explicit statement of what your expectations are for how it can or cannot be used, re-used cited and disseminated.

    The DCC guide is a good starting point for understanding the issues. And in Australia there are the Australian National Data Service (ANDS) guides.

    For openly available data in Australia the recommendation from ANDS is to use Creative Commons licenses, which are copyright-based licences (even though there is a degree of uncertainty around the extent of copyright in data). The CC licenses give you a way to express the terms under which you would like to share data[2].

    For confidential or commercially sensitive data that can’t be shared openly talk to your office of commercialisation about appropriate contractual arrangements for data sharing.

Copyright Peter Sefton. Licensed under Creative Commons Attribution-Share Alike 2.5 Australia. <http://creativecommons.org/licenses/by-sa/2.5/au/>

[1] See http://www.ipaustralia.gov.au/understanding-intellectual-property/what-is-ip/ip-protection/

Types of IP protection

IP rights give you the exclusive legal right to take advantage your IP and help you prevent others infringing it. There are different types of IP protection in Australia, each with its own legislation.

  • Patents – for new or improved products or processes.

  • Trade marks – for letters, words, phrases, sounds, smells, shapes, logos, pictures, aspects of packaging or a combination of these, to distinguish the goods and services of one trader from those of another.

  • Designs – for the shape or appearance of manufactured goods.

  • Plant breeder’s rights – for new plant varieties.

  • Copyright – for original material in literary, artistic, dramatic or musical works, films, broadcasts, multimedia and computer programs.

  • Circuit layout rights – for the three-dimensional configuration of electronic circuits in integrated circuit products or layout designs.

  • Confidentiality/trade secrets – including know-how and other confidential or proprietary information.

[2] See  AusGOAL — Australian Governments Open Access and Licensing Framework which says:

Nothing prevents the AusGOAL suite of licences being applied to data.   The law with respect to the subsistence of copyright in factual information (for example, data) is currently being tested before the Courts.  The outcomes of  this litigation may cause further refinement of the application of Creative Commons licences. It is also important to note that not all data would fall into the spectrum being considered in the present litigation.  Until the litigation is resolved, the Creative Commons licences remain an effective tool for the licensing of data. 

In any event, Creative Commons licences will continue to adequately serve the purpose of identifying attribution requirements of the creator or publisher, and provide other benefits such as a positive and prominent notice of terms and conditions of use of the information to which they are applied.  The AusGOAL Framework will be updated and or modified to address the outcomes of the current legal proceedings when they come to an end.

  • Steve Bennett

    In the academic context (as opposed to, say, corporations), surely “ownership” can extend beyond just what is legally enforceable. After all, academics don’t sue each other. So just as one academic “stealing” another’s idea is not true theft, “owning” data might mean something like “did all the hard work to acquire, and hence deserves the privilege of publishing first, being acknowledged” etc.

    Do you have any examples of this use of “ownership” that we could dissect? I’ll make up a couple:

    1) “I don’t own the data, so I’m not allowed to give you access to it.”
    2) “Only the owner of the data should provide a RIF-CS record to ANDS”.

    The first one to me is much like when someone “owns” a Github repository, or a blog or something. It’s the (perceived?) right to control access to it.

    The second one is designating a single custodian.

    I don’t really see IP issues like copyright at play here.

    • http://ptsefton.com ptsefton

      Those are good examples – I can see people saying them, but unless we unpick what they mean then it is hard to formulate policy or clear advice.

      If ANDS say (2) then I think they need to be clear about what they mean by owner – if it’s not copyright then what does it mean? How do I know if I am the “owner”?

      If I am talking to researchers I want to make sure there is as little room for ambiguity as possible. I think people making decisions about sharing data need to be aware of the issues, and talking with specific reference to legal frameworks vs community norms is part of that.

      “I collected this, I collated this, I am planning to keep this for at least ten years, I am sharing this with the world with the expectation that…” these are all more useful in my opinion than “I own this”.

      I have some concrete examples from policy my own university – the policies are all public, but II won’t attempt to air them or interpret them here.

      In general I will say that I have found in my small survey of policies and draft policies at a variety Australian universities that statements about creative works to be much more clearly worded as to legal rights (ie they talk about ownership of copyright) than most materials about research data.

  • Ingrid Mason

    Peter, great to see this blog post. Perhaps this is the source of a good series of papers for eResearch Australasia 2012?

    It’s interesting you’re talking about ownership and control, but not responsibility or custodianship of research data. There is an implied responsibility of the researcher (and the research institution) to the wider community that throws all this debate about ownership into a more fruitful discussion around purpose, to use, to reuse, to keep in case it might be useful, in the short, medium and long term.  I’m glad you end up pointing people to licences and it would be good to have wider discussion around the release of data and ethics – and – (sigh) data curation.

    Do you create data for your own ends?  If the ends are mixed, i.e. yours and the organisation you work for, or whoever paid for it, and the rest of society, then that’s a very mixed bag of responsibilities, and potential points of use over time.  If the use is clear now, but unclear further down, who gets the responsibility of keeping (or deleting) the data, and what role do they pick up, if they’re not the owner, but the custodian, but still enabling access, are they then a pirate?  All the knowledge around this is the meat and potatoes of those that work the library/information world – information management.

    So much of this discussion of IP in research and data is around the commercialisation, and there’s a question in my mind around whether that is the exception that proves the rule, and clouds the issue of the wider process around data generation and reuse, information generation (and the emergence of new knowledge).  How much data is commercialisable, when and in what setting near to the time of production (0-2 years)? Just ask one question, at what point is asserting IP and control useful and to who?  The point at which data, when access is constrained, can generate income, for a limited party, which can then also benefit society by providing a product or service.  That capacity to realise may be for 2 years or 200 years, depending on how well the IP laws support those efforts. Forcing a situation of scarcity overall really slows information generation right down. We already know that – that’s why libraries were developed – to coordinate selection of and access to information – where it was understood that enabling access to information and respecting property need careful balance and management. John Quiggan wrote a report for the cultural sector for gov2 about the release of cultural metadata that might be worth having a read of.

    Eli Neiburger at VALA 2012 talked about focusing on “fair use” rather than copyright, when dispensing advice.  That has certainly reframed this debate for me, in that when I think about what seems to be a “fair use” I come up with some useful thoughts for advice for data generators and data reusers – the researchers at both ends of the equation.  It has also made me wonder that from a legal and research methods (especially sampling) perspective, if you create something, that someone else relies on, what onus is there on you to declare authenticity and integrity, or to dispense with that responsibility when you release it?  Should researchers be given advice when releasing their own or in fact reusing others data about the validity and viability of that data reuse?


    (Disclaimer: I’m commenting as a professional and these are my thoughts and opinions and are not to be read as those of my employer)

  • Susan

    Nothing prevents the AusGOAL suite of licences being applied to data. The law with respect to the subsistence of copyright in factual information (for example, data) is currently being tested before the Courts.

    Very curious thought