XML : How do I keep track of parsing progress of large files in StAX?

I'm processing large (1TB) XML files using the StAX API. Let's assume we have a loop handling some elements:

  XMLInputFactory fac = XMLInputFactory.newInstance();   XMLStreamReader reader = fac.createXMLStreamReader(new FileReader(inputFile));     while (true) {         if (reader.nextTag() == XMLStreamConstants.START_ELEMENT){              // handle contents         }  }    

How do I keep track of overall progress within the large XML file? Fetching the offset from reader works fine for smaller files:

  int offset = reader.getLocation().getCharacterOffset();    

but being an Integer offset, it'll probably only work for files up to 2GB...

No comments:

Post a Comment