I'm a completely newbie to Python and have been using it recently to try and parse a large-ish xml file 700mb.
Having looked around I have been attempting to use the iterparse methods to to remove a element called Revision_History for the XML since we no longer require this information.
I've been through a couple of variations with this script, so it could be horribly wrong now. It seems to work fine for the first two removals. However it then stops working and finds no further revision_history tags.
import xml.etree.ElementTree as ET
for event, elem in ET.iterparse("AAT.xml", events=("end",)):
if event == "end":
for subject in elem.findall ("{http://localhost/namespace}Subject"):
print ("subject found")
for revision in subject.findall("("{http://localhost/namespace}Revision_History"):
print ("revision found")
subject.remove (revision)
print ("done")
elem.clear()
Any advice much appreciated!
Adam
No comments:
Post a Comment