Deserializing xml with XMLScanner causing the output string to be bigger

Our product upgrade process includes old schema DB export to files (JAXB serialization) then import to the new schema (StAX + JAXB). Sometimes the upgrade fails due to insert errors that caused by values that exceeds their maximum size dispite that they were previously exported from same DB table.

This happened when deserializing xml (In this case with Jaxb but its not related just to Jaxb) and one of the attributes have value with sequence of high surrogate UTF-8 characters the SAX parser have bug causing the output string to be bigger:

3 chars -> (1+2+3=) 6 chars.

6 chars -> (1+2+3+4+5+6=) 21 chars.

(Arithmetic progression of the source chars)

The code is from java 1.7_45 code class com.sun.org.apache.xerces.internal.impl.XMLScanner lines: 976 - 981:


else if (c != -1 && XMLChar.isHighSurrogate(c)) {
    if (scanSurrogates(fStringBuffer3)) {
        stringBuffer.append(fStringBuffer3);
    if (entityDepth == fEntityDepth && fNeedNonNormalizedValue){
        fStringBuffer2.append(fStringBuffer3); 
    }

fStringBuffer3 buffer is not cleared between the loops.

I checked the java bug database, this bug is not mentioned there. So I am looking for a fix for this issue, replacing JAXB parser with Woodstox parser solves the bug, unfortunately it's too risky for us.

Have sombody encoutered with this issue?

Deserializing xml with XMLScanner causing the output string to be bigger

No comments:

Post a Comment