How to query XML with complex types



I am building a program (Visual Studio 2010, .NET 4, C# based console application) to gather specific information from a publicly available government report that is only available as an xml download. Its structure is similar to the following:



<Collections>
<Collection>
<Info id="123456" address="Some Place" name="Some Name"/>
<Items>
<Item1/>
<Item2/>
<Item3 I3="Y"/>
<Item3A I3A1="N" I3A2="N" I3A3 = "Y"/>
<Item3B I3B1="N" I3B2="N"/>
<Item4/>
</Items>
</Collection>
<Collection>...</Collection>
<Collection>...</Collection>
</Collections>


The full file has hundreds of blocks and ranges from 50-100mb. I have never worked with XML formatted even remotely closely to this (it looks awful, right?) and have had a lot of trouble trying to find any examples of queries that are useful.


I need to return the id from the element for all nodes that have a "Y" in the elements Item3 through Item3B. It's driving me a little crazy, because it would be easy if they had matching element names and matching attributes, but they are all unique. You can't include a wildcard in an XPath query like /Item3*[Q3*="Y"].


Does anybody have any ideas on how to tackle this? Thanks!


No comments:

Post a Comment