Parse floats and return paths except for certain attribute values



I am trying to parse the floats of a deeply nested XML file and return the paths of this nodes except for certain attribute values. E.g. given the file below, I would like to return all floats but exclude certain attributes say month=05 & month=06,



<data>
<country name="Liechtenstein">
<rank updated="yes">2</rank>
<language>english</language>
<currency>1.21$/kg</currency>
<gdppc month="06">141100</gdppc>
<gdpnp month="10">2.304e+0150</gdpnp>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore">
<rank updated="yes">5</rank>
<language>english</language>
<currency>4.1$/kg</currency>
<gdppc month="05">59900</gdppc>
<gdpnp month="08">5.2e-015</gdpnp>
<neighbor name="Malaysia" direction="N"/>
</country>


I would like to return 2, 2.304e+0150, 5 and 5.2e-015 along with their paths i.e. omit texts that are not totally numeric e.g. english, 1.21$/kg or 4.1$/kg. While also restricting text from attributes month=05 & month=06 i.e. 141100 and 59900.


From the previous post I have the following which gets the path of all float-able numbers.



def extractNumbers(path, node):
nums = []

path += '/' + node.tag
if 'name' in node.keys():
path += '=' + node.attrib['name']

try:
num = float(node.text)
nums.append( (path, num) )
except (ValueError, TypeError):
pass

for e in list(node):
nums.extend( extractNumbers(path, e) )

return nums

tree = ET.parse('jerry.xml')
nums = extractNumbers('', tree.getroot())
print len(nums)
print nums


How can I add attribute restriction to this? When i add node.attrib["month"] in ["05","06"] to the except function, it doesnot work. I would appreciate any help.


No comments:

Post a Comment