Utility to remove nodes from a HUGE (>2gb) XML file



I'm working with several huge (>2gb) XML files and their size is causing problems.


(For example, I'm using XMLReader in a PHP script to parse smaller ~500mb files, and that works fine, but XMLReader won't open these large files.)


So - my idea is to eliminate big chunks of the file that I know I don't need.


For example, if the structure of the file looks like this:



<record id="1">
<a>
<detail>blah</detail>
....
<detail>blah</detail>
</a>
<b>
<detail>blah</detail>
....
<detail>blah</detail>
</b>
<c>
<detail>blah</detail>
....
<detail>blah</detail>
</c>
</record>
...
<record id="999999">
<a>
<detail>blah</detail>
....
<detail>blah</detail>
</a>
<b>
<detail>blah</detail>
....
<detail>blah</detail>
</b>
<c>
<detail>blah</detail>
....
<detail>blah</detail>
</c>
</record>


For my purposes - I only need the data in parent node <a> for each record. If I could eliminate parent nodes <b> and <c> from every record , I could reduce the size of the file substantially, so it would be small enough to work with normally.


What's the best way to do something like this (hopefully with something like grep or a free/cheap application)?


I've tried a trial version of Altova XML Spy and it won't even open the XML file (I assume it's because it's too large).


No comments:

Post a Comment