sklar.com

...composed of an indefinite, perhaps infinite number of hexagonal galleries...

© 1994-2017. David Sklar. All rights reserved.

XML vs. Control Characters

The XML spec says that in XML documents, “Legal characters are tab, carriage return, line feed, and the legal characters of Unicode and ISO/IEC 10646.” That is:

Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]



This means that the control characters under 0x20 (with the exception of the Three Wise Whitespace) are not allowed.



This restriction goes all the way back to the definition of a “character” in the W3C Working Draft of November 14, 1996.



Plenty of brilliance went into crafting that document, so I must be missing something extremely obvious here: why are those control characters outlawed? What is the reasoning behind the spec preventing me from including in an XML document:

<data>&#x07;</data>



Preventing terminal beeps from errant Ctrl-Gs?


Tagged with software , infrastructure