XML vs. Control Characters

The XML spec says that in XML documents, “Legal characters are tab, carriage return, line feed, and the legal characters of Unicode and ISO/IEC 10646.” That is:

Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

This means that the control characters under 0x20 (with the exception of the Three Wise Whitespace) are not allowed.

This restriction goes all the way back to the definition of a “character” in the W3C Working Draft of November 14, 1996.

Plenty of brilliance went into crafting that document, so I must be missing something extremely obvious here: why are those control characters outlawed? What is the reasoning behind the spec preventing me from including in an XML document:


Preventing terminal beeps from errant Ctrl-Gs?

