Using Python xml.sax to parse XML file and values are read twice?



I am trying to read a XML file using Python xml.sax.


The XML file is composed of blocks looks like below:



<NS:Member>
<NS:Area fid='120410'>
<NS:Code>10021</NS:Code>
<NS:version>4</NS:version>
<NS:versionDate>2004-03-29</NS:versionDate>
<NS:theme>Buildings</NS:theme>
<NS:Value>42.826432</NS:Value>
<NS:changeHistory>
<NS:changeDate>2002-09-26</NS:changeDate>
<NS:reasonForChange>New</NS:reasonForChange>
</NS:changeHistory>
<NS:changeHistory>
<NS:changeDate>2003-10-24</NS:changeDate>
<NS:reasonForChange>Attributes</NS:reasonForChange>
</NS:changeHistory>
<NS:changeHistory>
<NS:changeDate>2004-03-18</NS:changeDate>
<NS:reasonForChange>Attributes</NS:reasonForChange>
</NS:changeHistory>
<NS:Group>Building</NS:Group>
<NS:make>Manmade</NS:make>
<NS:Level>50</NS:Level>
<NS:polygon>
<NS2:Polygon srsName='NS2:BNG'>
<NS2:Boundary>
<NS2:LinearRing>
<NS2:coordinates>383415.110,400491.900 383411.090,400485.570 383415.500,400482.770 383420.430,400490.530 383418.780,400491.580 383417.930,400490.240 383415.160,400491.980 383415.110,400491.900
</NS2:coordinates>
</NS2:LinearRing>
</NS2:Boundary>
</NS2:Polygon>
</NS:polygon></NS:Area>
</NS:Member>


I am only interested at the ID, Group, make and coordinates part in the XML file.


And the code I use is:



import xml.sax

class MyHandler(xml.sax.ContentHandler):

def __init__(self):
self.__CurrentData = ""
self.__ID = ""
self.__Group = ""
self.__make = ""
self.__coordinates = []
self.__coordString = ""


def startElement(self, tag, attributes):
self.__CurrentData = tag
if tag == "NS:Area":
self.__ID = attributes["fid"]
print "ID: ", self.__ID


def endElement(self, tag):
if self.__CurrentData == "NS:Group":
print "Group: ", self.__Group

elif self.__CurrentData == "NS:make":
print "Make: ", self.__make

elif self.__CurrentData == "NS2:coordinates":
print "coordinates: ", self.__coordString

if self.__descriptiveGroup == "Building" and self.__make == "Manmade":
""" ==== Parse Coordinates ==== """
self.__coordinates = []
for point in self.__coordString.split():
x = float(point.split(",")[0])
y = float(point.split(",")[1])
self.__coordinates.append([x, y])

self.__CurrentData = ""


def characters(self, content):
if self.__CurrentData == "NS:Area":
self.__ID = content
elif self.__CurrentData == "NS:Group":
self.__Group = content
elif self.__CurrentData == "NS:make":
self.__make = content
elif self.__CurrentData == "NS2:coordinates":
self.__coordString = content


So I expected the output as:



ID: 120410


Group: Building


Make: Manmade


coordString: 383415.110,400491.900 383411.090,400485.570 383415.500,400482.770 383420.430,400490.530 383418.780,400491.580 383417.930,400490.240 383415.160,400491.980 383415.110,400491.900


coordinates: 383415.110,400491.900 383411.090,400485.570 383415.500,400482.770 383420.430,400490.530 383418.780,400491.580 383417.930,400490.240 383415.160,400491.980 383415.110,400491.900



However, what I've got is:



ID: 120410


Group: Building


Make: Manmade


coordString: 383415.110,400491.900 383411.090,400485.570 383415.500,400482.770 383420.430,400490.530 383418.780,400491.580 383417.930,400490.240 383415.160,400491.980 383415.110,400491.900


coordString: 0491.980 383415.110,400491.900


coordinates: 0491.980 383415.110,400491.900



It seems that the parser has read the coordinates twice, and the second time is wrong. Thus when I split the coordinates and put them into a list, it generated such errors:



File "C:\Users\Tim\Documents\Aptana Studio 3 Workspace\OSXMLReader\MyXMLParser.py", line 59, in endElement



y = float(point.split(",")[1]) IndexError: list index out of range


This is really haunting me for a while, so I really appreciate if someone can help me out.


Many thanks.


No comments:

Post a Comment