Multi-byte character XML entity



I'm having a problem encoding a multi-byte character to an XML document



import java.io.ByteArrayOutputStream;
import java.io.UnsupportedEncodingException;
import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.XMLStreamWriter;

public class XmlWriter {
static final XMLOutputFactory outputFactory = XMLOutputFactory.newFactory();
static XMLStreamWriter streamWriter;

public static String Write(String s) throws XMLStreamException, UnsupportedEncodingException {
ByteArrayOutputStream out = new ByteArrayOutputStream();
streamWriter = outputFactory.createXMLStreamWriter(out, "utf-16");
streamWriter.writeCharacters(s);
streamWriter.flush();
return new String(out.toByteArray());
}
}


public class XmlWriterTest extends TestCase {

public void testWrite() throws Exception {
System.out.println("Write");
String s = "\uD803\uDC22";
String expResult = "𐰢";
String result = XmlWriter.Write(s);
assertEquals(expResult, result);

}


I've tried many contortions of charsets etc but to no avail; I keep getting an output of



�&#xdc22



This is part of an application which generates an Excel Workbook (*.xlsx) and is failing when the document is opened in Excel due to these characters.


What can I do to achieve the correct XML entity? I was hoping that this would be handled by the XML library (the original code used Apache's StringEscapeUtils.escapeXml()).


No comments:

Post a Comment