Crossref operates an OAI-PMH service for the distribution of metadata to subscribers (Enhanced CMS). This system is based on the OAI-PMH version 2 repository framework and implements the interface as documented at http://www.openarchives.org/OAI/openarchivesprotocol.html.
We allow public access to two OAI verbs which allow for discovery of coverage information
The OAI-PMH service is open to Enhanced CMS subscribers only. Subscribers may provide 2 IP addresses or CIDR ranges for authentication. Token-based access is also available.
We allow 3 concurrent initial OAI-PMH requests per user. There is no concurrency limit for follow-on requests (requests made with a resumption token).
We support selective harvesting according to sets defined by the hierarchy of publisher and title. Setspecs are formatted as follows:
- content type:prefix:pubID (ex: J:10.1002:4 = Journal content by the publisher Wiley, journal title Applied Organometallic Chemistry)
- content type:prefix (ex: J:10.1002, journals owned by publisher Wiley)
The from and until dates in a request capture when a record was deposited or updated, not the published date of the item. This means a request for records from yesterday-today will return all records added or changed between then and now, regardless of the publication dates included in the records.
Content type are:
- J for journals
- B for books, conference proceedings, dissertations, reports, and datasets
- S for series
Due to the size of the repository it is highly discouraged to perform a ListRecords action for the entire collection. Use of the ListRecords verb must include a set specification.
With the ListSets request the set parameter is optional. Leaving off the set parameter will return a listing of all publishers, all their journal titles and each year of publication for which we have DOIs.
With the ListIdentifiers request the set, from, and until parameters are optional. The from and until parameters are used to specify dates when the DOIs were registered with Crossref and not the publication date.
The default set for both ListIdentifiers and ListRecords is J (journals). A set (B for books or conference proceedings, S for series) must be specified to retrieve non-journal data.
Many OAI requests are too big to be retrieved in a single transaction. If a given response contains a resumption token, the user must make an additional request to retrieve the rest of the data. Resumption tokens remain viable for 48 hours.
The resumption token includes an expiration date of 48 hours:
The token should be appended to the end of the next request:
Initial data loads
As mentioned above, the size of the Crossref repository precludes using one OAI request to retrieve all the available data. Upon request we can produce an archive of all the data available to a given subscriber which can then be downloaded via FTP. We limit archive requests to one per year.
A sample application for harvesting Crossref OAI data is available below (oaipmhRequest.zip). A sample ListIdentifiers, ListSets, and ListIdentifiers responses are also available (oaipmhSamples.zip).