6.4.3 Escaping of Names

Not every item name is a valid XML name. In particular, even though a content repository prefix is always a valid XML prefix, the content repository local name (the part after the colon, or the whole name, if there is no prefix) may not be a valid XML name. For example, a content repository name may contain spaces, whereas XML names cannot.

Consequently, for document view serialization, each content repository name is converted to a valid XML name (as defined by XML 1.0) by translating invalid characters into escaped numeric entity encodings5.

The escape character is the underscore (“_”). Any invalid character is escaped as _xHHHH_, where HHHH is the four-digit hexadecimal UTF-16 code for the character. When producing escape sequences the implementation should use lowercase letters for the hex digits a-f. When unescaping, however, both upper and lowercase alphabetic hexadecimal characters must be recognized.

Escaping and unescaping is done by parsing the name from left to right.

The underscore character (“_”), when appearing as literal, is itself escaped if it is followed by xHHHH where H is one of the following characters: 0123456789abcdefABCDEF.

So, for example,