How to round-trip "invalid XML characters" - aren't character references valid/applicable?



The following two code samples demonstrates this issue I am encountering where "invalid characters" are not encoded or decoded.



var elm = new XElement("foo", "\x12")
elm.ToString();
// ArgumentException: '', hexadecimal value 0x12, is an invalid character.


Likewise, parsing



var elm2 = XElement.Parse("<foo>&#x0012;</foo>");
// XmlException: '', hexadecimal value 0x12, is an invalid character ..


This is causes unexpected exceptions in unexpected cases -


How can I "resolve" this such that the XML is always properly encoded without exception?


In this case it would be OK to simply drop the invalid XML characters, but don't wish to perform the action manually for every text node inserted into the XElement structure. How can this problem be generally dealt with?


And, if I must preserve these "invalid characters" in a round-trip, is there a standard method of doing so?




I am surprised to see that using an XML entity did not fix the issue - isn't encoded encoded? This isn't an XElement only issue, although answers can rely on XElement being used, as online validation sites also reject the XML in the second case.


No comments:

Post a Comment