All XML submitted to our system must be UTF-8 encoded. There are two ways to include a special Unicode character in a Crossref deposit XML file:
- Use a UTF-8 editor or tool when creating the XML and insert characters directly into the file, which results in a one or more byte sequence per character in the file.
For example, an "S" with a háček (Š) has a decimal value of 352 which is 160hex. This value converts to the UTF-8 sequence C5,A0 in hex. You can create a small XML file in which you insert this two-byte sequence (shown here between the <UTF_encoded> tags).
<?xml version="1.0" encoding="utf-8" ?>
The character displays properly in a browser but if you save the XML source and try to view it in certain editors, it will not display correctly.
- Encode the special character using a numerical representation. This is the preferred approach and is implemented by constructing an entity reference in the XML that is the numerical value of the character. For example, <surname>Šumbera</surname> includes the special character "S" with a háček (Š).
XML based on schema does not support named character entities (sometimes referred to as html-encoded characters). For example, é or – are not allowed. In order to include these characters you must use their numerical representation, é or – respectively. These are called numerical entities as signified by the presence of the '#' character. The 'x' following the pound sign indicates the value is in hex, as opposed to decimal if the 'x' were omitted. All entities must end with the ';' character.
Character numerical values may be found on the Unicode web site at The Unicode Character Code Charts. For more information about Unicode and UTF-8, see UTF-8 and Unicode FAQ for Unix/Linux and The ISO 8859 Alphabet Soup.
Some style/face markup is supported by the Crossref schema but we recommend minimal use of this capability. Only when style markup is essential to the meaning of the text should it be used