Retrieving values of a tag in multiple places in an XML file using Minidom.



I wrote a script to pull out a tag value in an XML file. That works fine. Here comes the challenge for me. I have an XML file that has the following structure in numerous places.



<chl xml:id="i1120656"><title xml:id="id_4xwj">XXX</title>

<chl><title xml:id="id_ed4s">YY</title>
<p>Applile. </p>
</chl>
<chl><title xml:id="id_y34k">ZZ</title>

</chl>
<chl><title xml:id="id_993y">VBN</title>
</grid>
</chl>
<chl><title xml:id="id_iz8b">GG</title>
<p>None</p>
</chl>
<chl><title xml:id="id_sjzb">NN</title>
</chl>
<chl><title xml:id="id_9dgx">E</title>
</chl>
</chl>


In the above file, the ch1 tag has multiple ch1 tags inside and multiple title tags. Like this, i have the same structure everywhere in the doc. Now i need to pull out the first title tag in all the ch1 tags.In this e.g. it is XXX.


I have tried some way and nothing works. I am a newbie, so please help in finding the way. here are some i tried.



from xml.dom.minidom import parse, parseString
import os, stat
import sys
def shahul(dir):
for r,d,f in os.walk(dir):
for files in f:
if files.endswith(".xml"):

dom=parse(os.path.join(r, files))

nodelist = dom.getElementsByTagName("ch1").getElementsByTagName("title")[0]
for node in nodelist:
print (files, node.firstChild.nodeValue, sep="\t")
shahul("location")


and if i put like the following way, i am able to get all the title tags in the doc, but i want only the first element in all te ch1 tags.



dom=parse(os.path.join(r, files))

nodelist = dom.getElementsByTagName("title")
for node in nodelist:
print (files, node.firstChild.nodeValue, sep="\t")


Please help me in doing this. Thanks a lot.


No comments:

Post a Comment