XML : Parsing big XML file with element names repeated at different levels

I'm trying to parse a very big XML file in C# - big enough that some XML tools won't handle it, so I want to handle it sequentially rather than loading it all in. Also, if there are certain errors in the source I want to be able to report the error along with the line number in the XML on which it occurred.

Unfortunately, the XML repeats element names at different levels, something like:

  <foo>      <foo>          <foo>Something interesting</foo>      </foo>      Something else interesting      <foo>Yes, it's horrid, isn't it?</foo>  </foo>    

And I need to keep track of the nesting level at which things occur.

I've tried using XmlTextReader, but I seem to just get a list of foo elements: I can't work out how to track the nesting level. My next thought was to use ReadSubtree on each element so I could use that to let me know when I'd returned from a nesting. But that returns an XmlReader, not an XmlTextReader, so I no longer have access to the line number of the original XML. A websearch suggests using ReadOuterXml to get the text of the node and generate another reader from that, but that appears to read in the entire text so I'm back with my original problem of the file being so big.

So how can I keep track of nesting level (when the element names don't help) and source line number without loading the whole file in?

No comments:

Post a Comment