XML vs. Control Characters
29 Jun 2006The XML spec says that in XML documents, “Legal characters are tab, carriage return, line feed, and the legal characters of Unicode and ISO/IEC 10646.” That is:
Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
This means that the control characters under 0x20 (with the exception of the Three Wise Whitespace) are not allowed.
This restriction goes all the way back to the definition of a “character” in the W3C Working Draft of November 14, 1996.
Plenty of brilliance went into crafting that document, so I must be missing something extremely obvious here: why are those control characters outlawed? What is the reasoning behind the spec preventing me from including in an XML document:
<data></data>
Preventing terminal beeps from errant Ctrl-Gs?
Tagged with software , infrastructure