I am looking for the most efficient way to create an XML file in ISO 8859-1 with all non ISO 8859-1 characters replaced. My system supports the full Unicode character set, which means The receiving system only supports ISO 8859-1 characters, so it is not possible to use entity encoding in the file.
I tried using code like this:
XmlWriterSettings settings = new XmlWriterSettings();
settings.Encoding = Encoding.GetEncoding("ISO-8859-1");
settings.ConformanceLevel = ConformanceLevel.Document;
settings.Indent = true;
XmlWriter writer = XmlWriter.Create(outputStream, settings);
This creates an XML in ISO 8859-1 transport format, with non ISO 8859-1 characters entity encoded, for instance the ™ character, trademark, is replaced by ऊ.
What I want is built into the encoding class that replaces illegal characters with a ? character. Problem is that the XmlWriter does entity substitution before the encoding class can do it's magic, and I haven't found a way to tell it not to do entity substitution.
An example of what Encoding does:
String test = "Test 日本語";
byte[] byteArray = Encoding.GetEncoding("ISO-8859-1").GetBytes(test);
string result = Encoding.GetEncoding("ISO-8859-1").GetString(byteArray);
Result of this is the string "Test ???" which is exactly what I want.
As the file content can be quite large, I do not want to use the XmlWriter to create a string that I then convert with code similar to the above, I am looking for a solution where the XmlWriter can write directly to the output stream.
Any help is greatly appreciated.
No comments:
Post a Comment