Making research data connected, discoverable, and reusable is a major challenge of the research data revolution. As such, this paper discusses the limited clarity around licensing and transparency of usage terms and conditions for research metadata. We argue that a lack of transparency hinders discovery of research data, creating a disconnect from publications and other trusted research outcomes. We also explore the application of Creative Commons licenses for research metadata, and provide some examples of this approach and its applicability to internationally known data infrastructures.
Applying Creative Commons Licenses to Metadata
Another popular Creative Commons license for open access works is the CCBY license that enables third parties to distribute the work with attribution to the original creator. For the purposes of attribution, a significant problem arises when assigning CCBY licenses to aggregated metadata, that is, the sources of metadata records are not always clear, making attribution difficult. Moreover, the CCBY license has a requirement to “indicate if changes were made” which adds to the complexity of enriching metadata by aggregators. While both licenses, CCBY and CC0, require consent for the application of a licence by the original creator, tracking attribution is a problem distinct to CCBY.
There still remains a question around whether Copyright subsists in metadata. As AusGOAL advises, “recent developments in Australia have led to the situation where it is unclear which data is subject to copyright. In this situation, Australian researchers have to take a pragmatic approach and it would seem desirable to assume copyright as subsisting in all data created in the course of research, and ensure that it is licensed accordingly. No harm can come from this approach.” (AusGOAL 2016). ANDS adds to this, “It will still serve as a useful way to make known how you would like to be attributed, in addition to applying a limitation of liability and warranty clause to the data." (ANDS 2016). These statements from AusGOAL and ANDS relate to data, however, it is uncertain as to whether these same approaches can be applied to aggregated metadata. In cases where it is clear that Copyright has never subsisted in the aggregated metadata, application by a third party (organisation) of a CC Public Domain mark would suffice, provided the rights to do so have been established, including consideration that copyright for the material may subsist in other jurisdictions.
Our set of brief case studies on the application of Creative Commons licenses for metadata includes ANDS, CERN, da|ra, NCI, and OpenAIRE. Even if limited, this information can shed some light on the applicability of CC licenses to research data infrastructures.
ANDS: The Australian National Data Service (ANDS) manages Research Data Australia
(RDA), a national research data registry. RDA receives contributions from more than 100
Australian research institutions, data infrastructures, and research organisations (RDA 2016). ANDS collects and publishes all metadata under an agreement with contributors by which their records will be openly available on the web (ANDS Agreement 2010); however, there is no license attached to these metadata records, as often contributors do not assign licenses to their records.
CERN: The research institute for particle physics has different platforms and services related to scholarly information. Among those: (i) INSPIREHEP, the main information platform in highenergy physics, aggregates scholarly information from all relevant community resources. Additionally, the service provides ‘author pages’ (with ORCID integration) compiling information about researchers from the scholarly records available on INSPIRE. The metadata on this platform are shared with a CC0 waiver, with the expectation that third parties will use the available information to compile new services, such as citation statistics; (ii) the Open Data Portal publishes data and research materials accompanying datasets: documentation, software, trigger files, and tutorials to enable reuse by any interested audience. Objects are shared with liberal licenses, e.g. data and metadata with the CC0 waiver, and software with the GNU General Public License (GPL).
da|ra: A registration agency for social science and economics data in Germany. It is run by the GESIS Leibniz Institute for the Social Sciences and ZBW Leibniz Information Center for Economics, in cooperation with DataCite. This infrastructure lays the foundation for persistent identification, and reliable citation of research data via allocation of DOI names. Each DOI name is linked to a set of metadata. The da|ra Metadata Schema (Helbig et al 2014) provides a number of mandatory and optional elements that have to be submitted by the publication agent at the time of data registration. da|ra reserves the right to share metadata with information indexes and other entities. da|ra supports open metadata principles, whereby all metadata are made available under CC0 1.0 to encourage all providers (data centers, data repositories, libraries, etc.) to make their metadata available under the same terms. Since 2016 da|ra has been offering access to the metadata of the registered research data using the OAIPMH (OAI 2016). (www.dara.de/oaip/).
NCI: The National Computational Infrastructure (NCI http://nci.org.au) at the Australian National University (ANU) has evolved to become Australia’s peak computing centre for national computational and Dataintensive Earth system science. More recently NCI collocated 10 Petabytes of 30+ major national and international environmental, climate, earth system, geophysics and astronomy data collections to create the National Environmental Research Interoperability Data Platform (NERDIP). Data Collection management has become an essential activity at NCI. NCI’s partners (CSIRO, Bureau of Meteorology, ANU, and Geoscience Australia), supported by the Australian Government and Research Data Storage Infrastructure (RDSI) and Research Data Services (RDS), have established a national data resource that is colocated with highperformance computing. Most of the data are quality assured for ‘publication’ and made accessible as services under Creative Commons Attribution (CCBY) 4.0 as they are sourced from government agencies. The license files are published jointly with data through NCI’s OpenDAP server (http://dapds00.nci.org.au/thredds/catalog/licenses/catalog.html ). The metadata associated with data collections are publically available at http://geonetwork.nci.org.au. Aggregators such as RDA and International Directory Network of Committee on Earth Observation Satellites (CEOS) have harvested the metadata collection.
OpenAIRE: OpenAIRE infrastructure3 (Manghi et al 2012) is the point of reference for Open Access and Open Science in Europe (and beyond). Its mission is twofold: enabling the Open Science cultural shift of the current scientific communication infrastructure by linking, engaging, and aligning people, ideas, resources, and services at the global level; monitoring of Open Access trends and measuring research impact in terms of publications and datasets to serve research communities and funders. To this aim, OpenAIRE offers services (Schirrwagen et al 2013) that collect, harmonize, deduplicate, and enrich by inference (text mining) or enduser feedback, metadata relative to publications, datasets, organizations, persons, projects and several funders from all over the world. To join the infrastructure, data sources sign a Terms of Agreement where they grant to the OpenAIRE services the right of collecting and reusing metadata records under CC0. The graph is exported via standard protocols (e.g. HTTPREST search, LOD, OAIPMH) and formats, and the metadata records are available under CCBY or CC0, with no restriction of embargo or reuse.
In this paper we discuss the application of CC licenses to research metadata and the range of approaches taken by the five large registries/repositories in publishing their metadata records. Our case studies show some evidence where the CC0 or CCBY can be applied to research metadata, enabling reuse of metadata with a clear license and transparency about conditions of use. Yet, we have only examined a limited number of data infrastructures, our case studies do not include any funding information or identifier services such as ORCID, and we believe these are areas for future investigation. We also believe that the applicability of assigning copyright to metadata is an open question that requires further discussion.
ANDS 2016, Australian National Data Service guide on Metadata stores solutions, http://www.ands.org.au/guides/metadatastoressolutions [Last accessed 5 May 2016]
ANDS Agreement 2010, Providing metadata to ANDS agreement, http://www.ands.org.au/partnersandcommunities/partners/forms [Last accessed 9 May 2016]
ANDS Guides 2016, Copyright, data and licensing guide http://ands.org.au/guides/copyrightdataandlicensing.
AusGoal 2016, Research Data FAQs, http://www.ausgoal.gov.au/researchdatafaqs [Last accessed 9 May 2016]
CC0 2016, About CC0 – “No Rights Reserved”, https://creativecommons.org/about/cc0, [Last accessed 5 May 2016]
Cohen D 2013 CC0 (+BY), http://www.dancohen.org/2013/11/26/cc0by/ [Last accessed 5 May 2016]
Helbig, K., Hausstein, B., Koch, U., Meichsner, J. & Kempf, A. O. (2014). da|ra Metadata Schema. Version: 3.1. GESIS Technical Reports 2014/17. http://doi.org/10.4232/10.mdsdoc.3.1
Manghi P, Bolikowski L, Manola N, Shirrwagen J and Smith T 2012 Openaireplus: the European scholarly communication data infrastructure. DLib Magazine 18, pp 910.
OAI 2016 The Open Archives Initiative Protocol for Metadata Harvesting, https://www.openarchives.org/OAI/openarchivesprotocol.html [Last accessed 5 May 2016]
RDA 2016 Research Data Australia, https://researchdata.ands.org.au/contributors [Last accessed 5 May 2016]
Schirrwagen J, Manghi P, Manola N, Bolikowski L, Rettberg N, and Schmidt B 2013 Data curation in the openaire scholarly communication infrastructure. Information Standards Quarterly, 25(3), pp1319.
1 Amir Aryani (orcid.org/0000000242599774), Adrian Burton (orcid.org/0000000280997538), Marta Poblet (orcid.org/000000020026989X), Paolo Manghi (orcid.org/0000000172913210), Kathryn Unsworth (orcid.org/0000000254079987), Jingbo Wang (orcid.org/0000000235941893), Brigitte Hausstein (orcid.org/0000000154308201), Sunje DallmeierTiessen (orcid.org/0000000261372348), ClausPeter Klas (orcid.org/0000000277947716), Xiaobin Shen (orcid.org/0000000211618792)