Saturday, 4 April 2015

HTML Parser on XML page, getting a dictionary of data



I have an xml page with information as follows:



<Observation>
<Currency_name>U.S. dollar </Currency_name>
<Observation_ISO4217>USD</Observation_ISO4217>
<Observation_date>2015-03-09</Observation_date>
<Observation_data>1.2598</Observation_data>
<Observation_data_reciprocal>0.7938</Observation_data_reciprocal>
</Observation>
<Observation>
<Currency_name>U.S. dollar </Currency_name>
<Observation_ISO4217>USD</Observation_ISO4217>
<Observation_date>2015-03-11</Observation_date>
<Observation_data>1.2764</Observation_data>
<Observation_data_reciprocal>0.7835</Observation_data_reciprocal>
</Observation>
<Observation>
<Currency_name>Argentine peso</Currency_name>
<Observation_ISO4217>ARS</Observation_ISO4217>
<Observation_date>2015-03-09</Observation_date>
<Observation_data>0.1438</Observation_data>
<Observation_data_reciprocal>6.9541</Observation_data_reciprocal>
</Observation>
<Observation>
<Currency_name>Argentine peso</Currency_name>
<Observation_ISO4217>ARS</Observation_ISO4217>
<Observation_date>2015-03-10</Observation_date>
<Observation_data>0.1440</Observation_data>
<Observation_data_reciprocal>6.9444</Observation_data_reciprocal>
</Observation>


I want a way to process the data so I can get information out of it, such as if I wanted to compare the two dates of the same currency, or if I want to compare the currency of two different countries. The problem I am having is trying to get that information into a dictionary as a good way to store it.


I am using the following code currently, but it wont work due to the multiple data of the same countries. The actual page has five (5) of the same countries for every country (total of 57)



class myHTMLParser(HTMLParser):

def __init__(self):
HTMLParser.__init__(self)
self.country = []
self.data = []
self.dic = {}
self.nameFlag = False

def handle_starttag(self, tag, attrs):
if tag == 'currency_name':
self.nameFlag = True
else:
self.nameFlag = False

def handle_endtag(self, tag):
pass

def handle_data(self, data):
if data.strip() != '' and self.nameFlag == True:
self.dic[data.strip()] = []


Can someone help me get a good way to store the data for multiple countries?


No comments:

Post a Comment