Create XML with just ISO 8859-1 characters



I am looking for the most efficient way to create an XML file in ISO 8859-1 with all non ISO 8859-1 characters replaced. My system supports the full Unicode character set, which means The receiving system only supports ISO 8859-1 characters, so it is not possible to use entity encoding in the file.


I tried using code like this:



XmlWriterSettings settings = new XmlWriterSettings();
settings.Encoding = Encoding.GetEncoding("ISO-8859-1");
settings.ConformanceLevel = ConformanceLevel.Document;
settings.Indent = true;
XmlWriter writer = XmlWriter.Create(outputStream, settings);


This creates an XML in ISO 8859-1 transport format, with non ISO 8859-1 characters entity encoded, for instance the character, trademark, is replaced by ऊ.


What I want is built into the encoding class that replaces illegal characters with a ? character. Problem is that the XmlWriter does entity substitution before the encoding class can do it's magic, and I haven't found a way to tell it not to do entity substitution.


An example of what Encoding does:



String test = "Test 日本語";
byte[] byteArray = Encoding.GetEncoding("ISO-8859-1").GetBytes(test);
string result = Encoding.GetEncoding("ISO-8859-1").GetString(byteArray);


Result of this is the string "Test ???" which is exactly what I want.


As the file content can be quite large, I do not want to use the XmlWriter to create a string that I then convert with code similar to the above, I am looking for a solution where the XmlWriter can write directly to the output stream.


Any help is greatly appreciated.


No comments:

Post a Comment