Exploring Creative Commons Licenses for Scholarly Metadata Records

Authors: Amir Aryani, Adrian Burton, Marta Marta Poblet, Paolo Manghi, Kathryn Unsworth, Jingbo Wang, Brigitte Hausstein, Sunje Dallmeier-Tiessen, Claus-Peter Klas, Xiaobin Shen, Natasha Simons  

Making research data connected, discoverable, and reusable is a major challenge of the research data revolution. As such, this paper discusses the limited clarity around licensing and transparency of usage terms and conditions for research metadata. We argue that a lack of transparency hinders discovery of research data, creating a disconnect from publications and other trusted research outcomes. We also explore the application of Creative Commons licenses for research metadata, and provide some examples of this approach and its applicability to internationally known data infrastructures.


Access to trusted research data is fuelling a new scientific paradigm that relies on high connectivity and collaboration. New policies, standards, and data infrastructures provide the bases for data­driven research. Yet, finding relevant, trusted, and reusable datasets remains a challenge for many researchers. New discovery services address this issue by drawing on open public information, but the lack of transparency about licenses and terms of use for metadata records hinders investment and innovation in this domain. Metadata records are essential for data discovery and management since they describe datasets and related research information (e.g. publications, grants, and contributors). If the terms of use are unclear or ambiguous, discovery services lack basic information on how metadata records can be used, to what extent they can be transformed or augmented, or whether they can be utilised as part of commercial applications. The Open Archives Initiative Protocol for Metadata Harvesting OAI­PMH (OAI 2016) offers a technical framework for metadata harvesting but no specific information on licensing or terms of use. In this paper we discuss the option to assign Creative Commons (CC) licenses to research metadata and provide some examples from several data registries.

Applying Creative Commons Licenses to Metadata

The effective reuse of metadata across data ­infrastructures and research institutions requires transparency about the conditions of use. The issue of assigning clear licenses and terms of use for public research information can be addressed by assigning Creative Commons (CC)1 licenses to public metadata records. The most accessible form of CC license is CC0 — “No Rights Reserved”. Whilst you could apply a CC0 waiver, there are some doubts about its force under Australian law, particularly with respect to moral rights. Furthermore, the disclaimer that accompanies CC0, at present, may be ineffective in protecting the user from liability for claims of negligence. The main issue with assigning a CC0 document to research metadata is the responsibility to collect the original records with the CC0 waiver. According to the Creative Commons definition (CC0 2016), “You should only apply CC0 to your own work, unless you have the necessary rights to apply CC0 to another person’s work.” This is why, for instance, Europeana releases all its metadata with the CC0 document and requires its data providers to waive all IP rights to the metadata provided. Likewise, the Digital Public Library of America (DPLA) requires all data and metadata donors to attach a CC0 document to any donation (Cohen 2013). Hence, unless adequate provisions are taken, metadata aggregators or repositories would not be able to assign a CC0 license to records created by other sources.

Another popular Creative Commons license for open access works is the CC­BY license that enables third parties to distribute the work with attribution to the original creator. For the purposes of attribution, a significant problem arises when assigning CC­BY licenses to aggregated metadata, that is, the sources of metadata records are not always clear, making attribution difficult. Moreover, the CC­BY license has a requirement to “indicate if changes were made” which adds to the complexity of enriching metadata by aggregators. While both licenses, CC­BY and CC0, require consent for the application of a licence by the original creator, tracking attribution is a problem distinct to CC­BY.

There still remains a question around whether Copyright subsists in metadata. As AusGOAL advises, “recent developments in Australia have led to the situation where it is unclear which data is subject to copyright. In this situation, Australian researchers have to take a pragmatic approach and it would seem desirable to assume copyright as subsisting in all data created in the course of research, and ensure that it is licensed accordingly. No harm can come from this approach.” (AusGOAL 2016). ANDS adds to this, “It will still serve as a useful way to make known how you would like to be attributed, in addition to applying a limitation of liability and warranty clause to the data." (ANDS 2016). These statements from AusGOAL and ANDS relate to data, however, it is uncertain as to whether these same approaches can be applied to aggregated metadata. In cases where it is clear that Copyright has never subsisted in the aggregated metadata, application by a third party (organisation) of a CC Public Domain mark would suffice, provided the rights to do so have been established, including consideration that copyright for the material may subsist in other jurisdictions.

Case Studies

Our set of brief case studies on the application of Creative Commons licenses for metadata includes ANDS, CERN, da|ra, NCI, and OpenAIRE. Even if limited, this information can shed some light on the applicability of CC licenses to research data infrastructures.

ANDS: The Australian National Data Service (ANDS) manages Research Data Australia

(RDA), a national research data registry. RDA receives contributions from more than 100

Australian research institutions, data infrastructures, and research organisations (RDA 2016). ANDS collects and publishes all metadata under an agreement with contributors by which their records will be openly available on the web (ANDS Agreement 2010); however, there is no license attached to these metadata records, as often contributors do not assign licenses to their records.

CERN: The research institute for particle physics has different platforms and services related to scholarly information. Among those: (i) INSPIREHEP, the main information platform in high­energy physics, aggregates scholarly information from all relevant community resources. Additionally, the service provides ‘author pages’ (with ORCID integration) compiling information about researchers from the scholarly records available on INSPIRE. The metadata on this platform are shared with a CC0 waiver, with the expectation that third parties will use the available information to compile new services, such as citation statistics; (ii) the Open Data Portal publishes data and research materials accompanying datasets: documentation, software, trigger files, and tutorials to enable reuse by any interested audience. Objects are shared with liberal licenses, e.g. data and metadata with the CC0 waiver, and software with the GNU General Public License (GPL).

da|ra: A registration agency for social science and economics data in Germany. It is run by the GESIS Leibniz Institute for the Social Sciences and ZBW Leibniz Information Center for Economics, in cooperation with DataCite. This infrastructure lays the foundation for persistent identification, and reliable citation of research data via allocation of DOI names. Each DOI name is linked to a set of metadata. The da|ra Metadata Schema (Helbig et al 2014) provides a number of mandatory and optional elements that have to be submitted by the publication agent at the time of data registration. da|ra reserves the right to share metadata with information indexes and other entities. da|ra supports open metadata principles, whereby all metadata are made available under CC0 1.0 to encourage all providers (data centers, data repositories, libraries, etc.) to make their metadata available under the same terms. Since 2016 da|ra has been offering access to the metadata of the registered research data using the OAI­PMH (OAI 2016). (www.da­ra.de/oaip/).

NCI: The National Computational Infrastructure (NCI http://nci.org.au) at the Australian National University (ANU) has evolved to become Australia’s peak computing centre for national computational and Data­intensive Earth system science. More recently NCI collocated 10 Petabytes of 30+ major national and international environmental, climate, earth system, geophysics and astronomy data collections to create the National Environmental Research Interoperability Data Platform (NERDIP). Data Collection management has become an essential activity at NCI. NCI’s partners (CSIRO, Bureau of Meteorology, ANU, and Geoscience Australia), supported by the Australian Government and Research Data Storage Infrastructure (RDSI) and Research Data Services (RDS), have established a national data resource that is co­located with high­performance computing. Most of the data are quality assured for ‘publication’ and made accessible as services under Creative Commons Attribution (CC­BY) 4.0 as they are sourced from government agencies. The license files are published jointly with data through NCI’s OpenDAP server (http://dapds00.nci.org.au/thredds/catalog/licenses/catalog.html ). The metadata associated with data collections are publically available at http://geonetwork.nci.org.au. Aggregators such as RDA and International Directory Network of Committee on Earth Observation Satellites (CEOS) have harvested the metadata collection.

OpenAIRE: OpenAIRE infrastructure3 (Manghi et al 2012) is the point of reference for Open Access and Open Science in Europe (and beyond). Its mission is twofold: enabling the Open Science cultural shift of the current scientific communication infrastructure by linking, engaging, and aligning people, ideas, resources, and services at the global level; monitoring of Open Access trends and measuring research impact in terms of publications and datasets to serve research communities and funders. To this aim, OpenAIRE offers services (Schirrwagen et al 2013) that collect, harmonize, de­duplicate, and enrich by inference (text mining) or end­user feedback, metadata relative to publications, datasets, organizations, persons, projects and several funders from all over the world. To join the infrastructure, data sources sign a Terms of Agreement where they grant to the OpenAIRE services the right of collecting and reusing metadata records under CC0. The graph is exported via standard protocols (e.g. HTTP­REST search, LOD, OAI­PMH) and formats, and the metadata records are available under CC­BY or CC0, with no restriction of embargo or re­use.


In this paper we discuss the application of CC licenses to research metadata and the range of approaches taken by the five large registries/repositories in publishing their metadata records. Our case studies show some evidence where the CC0 or CC­BY can be applied to research metadata, enabling reuse of metadata with a clear license and transparency about conditions of use. Yet, we have only examined a limited number of data infrastructures, our case studies do not include any funding information or identifier services such as ORCID, and we believe these are areas for future investigation. We also believe that the applicability of assigning copyright to metadata is an open question that requires further discussion.


