I have a zip file of 100 of xml files.I am processing it in flume and using an interceptor i am uncompromising it and putting the xml files in hdfs. Code for this is
ZipInputStream inputStream = new ZipInputStream(new ByteArrayInputStream(event.getBody()));
try {
while (inputStream.getNextEntry() != null) {
Event event2 = EventBuilder.withBody(IOUtils.toByteArray(inputStream));
events.add(event2);
}
} catch (IOException e) {
// e.printStackTrace();
System.err.println(e.getMessage());
}
Everything is fine upto this.But when am reading this files in java or mapreduce using a sax parser its throwing sax exception becuse of some junk characters at the begining of the xml.this might be the bite order mark.I am not able to solve this becuse this is happening some time only.Sometime it runs fine.Please provide any solution for this.
No comments:
Post a Comment