Data & software citation is good research practice and is part of the scholarly ecosystem supporting research validation and reproducibility. Data & software citation is also instrumental in enabling the reuse and verification of these research outputs, tracking their impact, and creating a scholarly structure that recognizes and rewards those involved in producing them.
Crossref supports the propagation of data & software citations alongside a publisher’s standard bibliographic metadata. Publisher members deposit the data citation link as part of the overall publication metadata when registering their content. Crossref partners with DataCite and together, we jointly provide a clearinghouse for the citations collected. These are all made freely available to the community as open data, human and machine-readable.
Depositing data citations
.While these citation practices are evolving across different communities of practice, Crossref’s offering is flexible and easily accommodates variations and changes: it does not rely on a specific set of citation metadata elements, citation format, nor manner of credit and attribution. Publishers deposit data & software citations in their metadata deposit via a) references and/or b) relation type.
Method A: Bibliographic references
Crossref and DataCite have partnered to provide auto-update linking between publications registered with Crossref and datasets bearing DataCite DOIs. This is the most efficient and effective way to ensure that data citations are fully integrated into the scholarly research information network with full and accurate metadata. This method also ensures the widest reach possible across stakeholders in the research enterprise.
All data & software citations that include datasets bearing a DataCite DOI are eligible for auto-update linking. In this method: authors cite the dataset or software containing the DataCite DOI per journal article submission guidelines and add it to the article citation list (c.f. Joint Declaration of Data Citation Principles, FORCE11 citation placement , FORCE11 Software Citation Principles). Publishers then deposit references as part of their standard practice when registering content1. Crossref checks every reference deposited for a DOI. If the DOI is identified as DataCite’s, we automatically link it to the article. With this method, no additional action is needed when publishers register their content.
1 CitedBy restrictions for references apply to the CitedBy service and are not in effect for Crossref-DataCite auto-linking results.
Method B: Relation type
Publishers can link their article to a variety of associated research objects as part of the article metadata directly in the metadata deposited to Crossref, including data & software, protocols, videos, published peer reviews, preprints, conference papers, etc. Doing so not only groups digital objects together, but formally associates them with the publication. Each link is a relationship and the sum of all these relationships constitutes a “research article nexus.” Data & software citations are a valuable part of this.
To tag the data & software citation in the metadata deposit, we ask for the inter-work relationship type, description of the dataset & software (optional), dataset & software identifier and identifier type. Crossref can accommodate any identifier, though we currently only validate DOI relationships during metadata processing. The following XML snippet provides an example of how data citation is included in the metadata deposit for a journal article.
<description>Acknowledgement mention of dataset use.</description>
<inter_work_relation relationship-type="isBasedOn" identifier-type="doi">10.5284/1000389</inter_work_relation>
In many cases, the dataset is generated as part of the research results and described in the article. Here Crossref and DataCite recommend using the “isSupplementedBy” relation type. Where the original dataset was produced by a different set of researchers than the article authors (ex: a Protein Database structure published five years ago), we recommend using the "references" relationship type.
Two methods, two channels
The two methods are independent and so Method A (references) and Method B (relation type) can be used exclusively or jointly. Each caters to a different set of conditions. See Table 1 for the benefits and limitations of each method. We recommend that publishers use both methods at this time where possible for optimum specificity and coverage.
Benefits & Limitations
Method A via relation type enables precise tagging of the dataset and its specific relationship to the research results published. It also accommodates any variety of dataset identifiers:
Crossref and DataCite make the data & software citations deposited by Crossref publisher members and DataCite data repositories openly available to a wide host of parties, including both Crossref and DataCite communities as well as the extended research ecosystem (funders, research organisations, technology and service providers, research data frameworks such as Scholix, etc.).
Data & software citations from references can be accessed via the Crossref Event Data API Citations included directly into the metadata by relation type can be accessed via Crossref’s APIs in a number of formats (REST, OAI-PMH, OpenURL). A single channel containing data & software citations from both methods across formats is in development and will be released in the future.