XML : Parsing XML in Python using the cElementTree module

I have an XML file, which I wanted to convert to a dictionary. I have tried to write the following code but the output is not as expected. I have the following XML file named core-site.xml:

  <configuration>      <property>      <name>hadoop.tmp.dir</name>      <value>/home/hdfs/tmp</value>      <description>Temporary Directory.</description>      </property>        <property>      <name>fs.defaultFS</name>      <value>hdfs://192.XXX.X.XXX:XXXX</value>      <description>Use HDFS as file storage engine</description>      </property>  </configuration>    

The code that I wrote is:

  import xml.etree.cElementTree  import xml.etree.ElementTree as ET  import warnings    warnings.filterwarnings("ignore")    class XmlListConfig(list):      def __init__(self, aList):          for element in aList:              if element:                  # treat like dict                  if len(element) == 1 or element[0].tag != element[1].tag:                      self.append(XmlDictConfig(element))                  # treat like list                  elif element[0].tag == element[1].tag:                      self.append(XmlListConfig(element))              elif element.text:                  text = element.text.strip()                  if text:                      self.append(text)      class XmlDictConfig(dict):      def __init__(self, parent_element):          if parent_element.items():              self.update(dict(parent_element.items()))          for element in parent_element:              if element:                  # treat like dict - we assume that if the first two tags                  # in a series are different, then they are all different.                  if len(element) == 1 or element[0].tag != element[1].tag:                      aDict = XmlDictConfig(element)                  # treat like list - we assume that if the first two tags                  # in a series are the same, then the rest are the same.                  else:                      # here, we put the list in dictionary; the key is the                      # tag name the list elements all share in common, and                      # the value is the list itself                       aDict = {element[0].tag: XmlListConfig(element)}                  # if the tag has attributes, add those to the dict                  if element.items():                      aDict.update(dict(element.items()))                  self.update({element.tag: aDict})              # this assumes that if you've got an attribute in a tag,              # you won't be having any text. This may or may not be a               # good idea -- time will tell. It works for the way we are              # currently doing XML configuration files...              elif element.items():                  self.update({element.tag: dict(element.items())})              # finally, if there are no child tags and no attributes, extract              # the text              else:                  self.update({element.tag: element.text})    tree = ET.parse('core-site.xml')  root = tree.getroot()  xmldict = XmlDictConfig(root)  print xmldict    

This is the output that I am getting:

  {      'property':       {          'name': 'fs.defaultFS',           'value': 'hdfs://192.X.X.X:XXXX',           'description': 'Use HDFS as file storage engine'      }  }    

Why isn't the first property tag being shown? It only shows the data in the last property tag.

No comments:

Post a Comment