We operate an OAI-PMH service for the distribution of metadata in XML. This system is based on the OAI-PMH version 2 repository framework and implements the interface as documented at http://www.openarchives.org/OAI/openarchivesprotocol.html.
The service interface can be used in different ways by:
- Public metadata users
- Metadata Plus subscribers
- Crossref members.
1 Public metadata users
We allow public access to two OAI verbs, ListSets and ListIdentifiers, which allow for discovery of available information.
2 Metadata Plus subscribers
Access to OAI verbs GetRecord and ListRecords require a subscription to our Metadata Plus service. Users of this service are provided with tokens to identify them. You will need to identify yourself in the request by using a “Crossref-Plus-API-Token” HTTP header with your access token. The example below shows how this should be formatted, with XXX replaced by your token:
Crossref-Plus-API-Token: Bearer XXX
3 Crossref members
Crossref members may also use OAI-PMH to retrieve their registered metadata using our Deposit Harvester using their member login.
Set hierarchy
We support selective harvesting according to sets defined by the hierarchy of publisher and title. Setspecs are formatted as follows:
- content type:prefix:pubID (ex: J:10.1002:4 = Journal content by the publisher Wiley, journal title Applied Organometallic Chemistry)
- content type:prefix (ex: J:10.1002, journals owned by publisher Wiley)
The from and until dates in a request capture when a record was deposited or updated, not the published date of the item. This means a request for records from yesterday through today will return all records added or changed between then and now, regardless of the publication dates included in the records.
Set content types are:
- J for journals
- B for books, conference proceedings, dissertations, reports, and datasets
- S for series
The default set for both ListIdentifiers and ListRecords is J (journals). A set (B for books or conference proceedings, S for series) must be specified to retrieve non-journal data.
With the ListSets request the set parameter is optional. Leaving off the set parameter will return only journal data which includes a list of publishers, their journal titles, and each year of publication for which we have metadata records.
With the ListIdentifiers request the set, from, and until parameters are optional. The from and until parameters are used to specify dates when the DOIs were registered with Crossref and not the publication date.
Examples
Request a list of DOIs registered since 2010-08-11:
http://oai.crossref.org/oai?verb=ListIdentifiers&metadataPrefix=cr_unixsd&from=2010-08-11
Request all journal sets:
http://oai.crossref.org/oai?verb=ListSets&set=J
Request all sets with content type 'B':
http://oai.crossref.org/oai?verb=ListSets&set=B
Request records for title '98765' with prefix 10.1234 registered or updated on 2017-07-06:
http://oai.crossref.org/oai?verb=ListRecords&metadataPrefix=cr_unixsd&set=J:10.1234:98765&from=2017-07-06&until=2017-07-06
Best practice for performance
We allow 3 concurrent initial OAI-PMH requests per user. There is no concurrency limit for follow-on requests (requests made with a resumption token). Due to the size of the repository it is highly discouraged to perform a ListRecords action for the entire collection.
The best possible performance is had by requesting one publication's day's changes, eg
http://oai.crossref.org/oai?verb=ListRecords&metadataPrefix=cr_unixsd&set=J:10.1234:98765&from=2017-07-06&until=2017-07-06
If you are harvesting a large amount of data and run up against our 3 concurrent initial request limitation, it is recommended that you request data by prefix for a short time frame (days to a week). For example, this request will give you all journal records owned by prefix 10.1234 registered or updated between 2017-07-06 and 2017-07-09 :
http://oai.crossref.org/oai?verb=ListRecords&metadataPrefix=cr_unixsd&set=J:10.1234&from=2017-07-06&until=2017-07-09
Resumption Tokens
Many OAI requests are too big to be retrieved in a single transaction. If a given response contains a resumption token, the user must make an additional request to retrieve the rest of the data. Resumption tokens remain viable for 48 hours.
The resumption token includes an expiration date of 48 hours:
<resumptionToken expirationDate="2015-10-28T00:00:00">c6cafedc-ef48-42a3-847c-b682dc58b617</resumptionToken>
The token should be appended to the end of the next request:
http://oai.crossref.org/oai?verb=ListSets&set=J:10.1007&resumptionToken=c6cafedc-ef48-42a3-847c-b682dc58b617
Metadata Plus snapshots
Metadata Plus snapshots provide access to our 100 million plus metadata records in a single file, providing an easy way to retrieve an up-to-date copy of our records. Snapshots are available for Metadata Plus service users.
The files are made available via a /snapshots route in the REST API which offers a compressed .tar file (tar.gz) containing the full extract of the metadata corpus in either JSON or XML formats.
Sample Files
A sample application for harvesting Crossref OAI data is available below (oaipmhRequest.zip). Sample ListIdentifiers, ListSets, and ListIdentifiers responses are also available (oaipmhSamples.zip).
Comments
0 comments
Please sign in to leave a comment.