XML : TypeError on importing dict to dataframe

Any tips on importing lxml.etree start event into pandas.DataFrame. The following code shows simple lxml parsing and converting entries into dataframe (pandas) using from_records. [NOTE: I tried from_dict but it needed a list per attribute while from_records seems to handle dictionaries better. ]

The pd.DataFrame.from_record fails on coercion of data attributes... with error:

  TypeError: Argument must be bytes or unicode, got 'int'

Thanks in advance for any tips?

CODE SNIPPET:

  x2="""<m2>    <entry attrm201=1 attrm202 attrm203=1>m0201_t</entry>    <entry attrm201=1 attrm0203=1>m0202_t</entry>    <entry displevel=1 entrytype=1>m0202_t</entry>  </m2>"""    import pandas as pd  objDF = pd.DataFrame()    import io  srcIO = io.StringIO(x2)  #srcIO = io.BytesIO(str.encode(x2))      from lxml import etree  for event, e in etree.iterparse(srcIO, recover=True, html=True, events=('start', 'end')):      if event != 'start' : continue      if e.tag != 'entry' : continue      elmDict = e.attrib      elmDict[e.tag] = e.text       df = pd.DataFrame.from_records(elmDict, index=[0])      objDF = pd.concat(objDF, df)      print(event, objDF)

XML : TypeError on importing dict to dataframe

No comments:

Post a Comment