Similarity Check is a service to help Crossref members actively engage in efforts to prevent scholarly and professional plagiarism. The Similarity Check service is available to eligible Crossref members and is powered by Turnitin’s powerful manuscript checking service, iThenticate.
In order to participate in Similarity Check, members need to be including as-crawled URLs within the metadata deposited with Crossref. This is so that Turnitin are able to locate and index each member’s full-text PDF or HTML content, and add it into the Similarity Check content database.
- Definition: What is an as-crawled URL?
- Why do I need to deposit as-crawled URLs for Similarity Check?
- Whitelisting IP addresses
- How do I deposit as-crawled URLs?
- Confirming your as-crawled URLs
- Need help using iThenticate?
An as-crawled URL is a specific crawler-friendly link used by crawler services to index content. For the Similarity Check service, an as-crawled URL needs to be deposited into the DOI metadata for each article-level DOI. The as-crawled URL must point to the location of the full-text PDF download or the full-text HTML content associated with the DOI.
Once as-crawled URLs are deposited for at least 90% of a member’s full DOI corpus (across all the member’s journal prefixes, if applicable), the member will be eligible to apply for membership to Similarity Check.
Turnitin (the company that provides the iThenticate service) needs the as-crawled URL for the content associated with each DOI in order to index your content as part of your Similarity Check membership agreement. Even if your as-crawled URLs are the same as your DOI resource URLs, you still need to enter them separately as specific as-crawled URLs.
Similarity Check members need to ensure their hosting domain has whitelisted Turnitin’s full IP range so that Turnitin’s indexing crawler has access to the member’s full-text content. To do this, members need to contact their hosting provider (if applicable) and ask them to enable Similarity Check indexing for their site by allowing Turnitin’s crawler to access the domain from the following IP ranges:
22.214.171.124 - 126.96.36.199
(with a subnet mask of 188.8.131.52/21)
Please also make sure that robots.txt does not disallow Turnitin’s user agent: UA: TurnitinBot/2.1
For new DOIs:
The as-crawled URL can be included as part of the standard DOI deposit metadata that a member deposits with Crossref. For Similarity Check, the as-crawled URL needs to be deposited within the "crawler-based" collection property, with item crawler "iParadigms". Here's an example:
Remember that the URL must point to the location of the full-text content associated with that DOI, not to the article landing page even if the content is available via a link on that page. The as-crawled URL is most commonly the PDF download link.
You can add as-crawled URLs into your already-deposited DOIs using a resource-only deposit, or by using the Supplemental-Metadata Upload option available from Crossref’s web deposit form.
If you deposit Crossref metadata via API, then please update your existing DOIs by submitting the required XML. As-crawled URLs can be submitted as part of a metadata deposit (example) or as a resource-only deposit (example). Instructions for uploading resource-only deposits are available here: DOI resource deposits
For existing DOIs (.csv upload)
To update DOIs using the web deposit form, please use the following steps:
- Format your .csv file using the instructions here: Formatting a .csv file for Similarity Check as-crawled URL deposits
- Go to the Web Deposit Form
- Select the "Supplemental-Metadata Upload" option.
- Enter your Crossref username and password in the appropriate fields.
- Enter your email address in the appropriate field.
- Upload your .csv file (from step 1)
- Click "Upload CSV file" and your as-crawled URLs will be submitted to Crossref for processing.
- You will be sent a log via email when your URLs have been processed - please review the log to make sure your DOIs were updated.
We have reports available on the Crossref website for all members which show whether or not Similarity Check as-crawled URLs have been deposited for each DOI. In the Members area of our website, please select the Depositor (journals) report. Locate your organization from the list and then select the green arrow to generate a report for each of your journals.
The last two columns in this online report show the number of DOIs missing Similarity Check as-crawled URLs (please note, these are referred to as ‘iParadigms’ URLs in our system) as well as the total number of DOIs deposited for that journal. Crossref members must have deposited as-crawled URLs for at least 90% of their full DOI corpus in order to be eligible to join the Similarity Check service. If the number of as-crawled URLs missing is highlighted in red within this report, this means that more than 10% of the journal's DOIs are missing this link.
To isolate exactly which DOIs are missing as-crawled URLs, select the publication title. Any DOI with the code ‘U’ assigned, is missing the Similarity Check as-crawled URL.
As part of your membership to Similarity Check, you will have access to Turnitin's manuscript checking tool iThenticate. If you have questions or need any help using iThenticate, please refer to the iThenticate user manual or the iThenticate API integration guide on Turnitin's help site or contact a member of their Similarity Check support team at firstname.lastname@example.org