Removing duplicate xml child elements in w/ python



Given xml that looks like:



<collection>
<name>Bob</name>
<name>Bob</name>
<name>Linda</name>
</collection>

<collection>
<name>Linda</name>
<name>Tina</name>
</collection>


I want to merge the collections & remove the duplicates among the children of the collection element, so that I end up with:



<collection>
<name>Bob</name>
<name>Linda</name>
<name>Tina</name>
</collection>


Currently, I'm using lxml.etree to parse the xml & grab the children, then converting each child element to a string (e.g. 'Bob'), then converting a list of these strings to a set to get unique values, and then writing the unique values back into xml.


This seems circuitous & clunky, though - is there a more elegant way?


No comments:

Post a Comment