Friday, 12 September 2014

efficiently removing duplicate xml elements in c#



I have couple of XML files that contain lots of duplicate entries, such as these.



annotations>
<annotation value=",Clear,Outdoors" eventID="2">
<image location="Location 1" />
<image location="Location 2" />
<image location="Location 2" />
</annotation>

<annotation value=",Not a problem,Gravel,Shopping" eventID="2">
<image location="Location 3" />
<image location="Location 4" />
<image location="Location 5" />
<image location="Location 5" />
<image location="Location 5" />
</annotation>
</annotations>


I want to remove the duplicate elements in the each of the child. The way I approached this is by copying all the elements to a list and then comparing them,



foreach (var el in xdoc.Descendants("annotation").ToList())
{
foreach (var x in el.Elements("image").Attributes("location").ToList())
{
//add elements to a list
}
}


half way through I realized this is very inefficient and time consuming. I'm fairly new to XML, I was wondering if there are any built in methods in C# that I can use to remove duplicates?.


I tried using



if(!x.value.Distinct()) // can't convert collections to bool
x.Remove();


But that doesn't work, neither does



if(x.value.count() > 1) // value.count returns the number of elements.
x.Remove()

No comments:

Post a Comment