Cleaning an xml file I have obtained unexpected results: tagsoup has orphaned some properties closing the parent tag too soon. It also downcases the parent tag's name.
Before tagsoup:
<Objects>
<Object>
<ObjectID>240</ObjectID>
[...]
<Status>Not Ready</Status>
<Title>Some description which includes word/word, 22,000</Title>
<Url>http://ift.tt/1wDs0ae;
[...]
<Owner>
<Name>JOHN MARSHALL, MR</Name>
</Owner>
</Object>
<Object>
<ObjectID>122</ObjectID>
[...]
After tagsoup:
<Objects>
<object>
<ObjectID>240</ObjectID>
[...]
<Status>Not Ready</Status>
</object>
<Title>Some description which includes word/word, 22,000</Title>
<Url>http://ift.tt/1wDs0ae;
[...]
<Owner>
<Name>JOHN MARSHALL, MR</Name>
</Owner>
<object>
<ObjectID>122</ObjectID>
[...]
I'm in a java project that uses this libraries:
import org.ccil.cowan.tagsoup.Parser;
import org.ccil.cowan.tagsoup.XMLWriter;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
I'm using Java 6.
Any clues? The desired output of a valid xml file would be the same file (maybe just changing details, but not the structure), wouldn't it?
No comments:
Post a Comment