Python: In an xml, How to delete nodes within a parent node, which are not mentioned in the list



I have a pretty big xml, from which I need to delete some specific nodes within another node. e.g. I have a list which contains the name of nodes which should be present in the xml. So, apart from these nodes all other nodes within the parent node should be deleted and to be written into a new xml file.


I need to delete only the nodes i.e. 'Instance' in which 'Data' is not equal to the value present in my list which I'll provide.Rest of the xml info i.e. 'Description', 'Symbols' tag should not be disturbed.


Assumptions:I have parsed the data which should be read from the external file into a python list variable.


DOM or SAX anyone is ok for me. But I believe DOM is very quick. Any hints to any BIF's available or logic will also be fine for me.


Below is the sample xml file:



<?xml version='1.0' encoding='UTF-8'?>
<Identification>
<Description ID="12">Some text</Description>
</Identification>
<Symbols>
<Name Width="1">abc</Name>
<Name Width="2">def</Name>
</Symbols>

<Instance RowRef="A">
<DataSet>
<Data>12345678</Data>
</DataSet>
<DataSet>
<Data>abcd</Data>
</DataSet>
<DataSet>
<Data>abcd</Data>
</DataSet>
</Instance>
<Instance RowRef="B">
<DataSet>
<Data>87654321</Data>
</DataSet>
<DataSet>
<Data>abcd</Data>
</DataSet>
<DataSet>
<Data>abcd</Data>
</DataSet>
</Instance>
<Instance RowRef="C">
<DataSet>
<Data>06354237/Data>
</DataSet>
<DataSet>
<Data>abcd</Data>
</DataSet>
<DataSet>
<Data>abcd</Data>
</DataSet>

No comments:

Post a Comment