XML : xml elementree missing elements python


I've been trying to parse an xml file (JMdict_e.xml) for translation purposes. However, parsing of the whole file returns an incomplete dataset.

Code:

  tree2 = ET.ElementTree(file = "JMdict_e.xml")  root2 = tree2.getroot()    print([i.tag for i in root2[55711]])  print([i.text for i in root2[55711][4]])    

returns the following entries:

Result:

  ['ent_seq', 'k_ele', 'r_ele', 'r_ele', 'sense']  ["Godan verb with `ru' ending", 'intransitive verb', 'to become less     capable', 'to grow dull', 'to become blunt', 'to weaken']    

Conversely, when the single entry is extracted from the original xml database, the following is obtained:

Code:

  import xml.etree.cElementTree as ET    tree = ET.ElementTree(file = "new.xml")  root = tree.getroot()  print([i.tag for i in root[1]])  for i in root[1]:      print([j.text for j in i if i.tag == "sense"])    

result:

  ['ent_seq', 'k_ele', 'r_ele', 'r_ele', 'sense', 'sense', 'sense', 'sense', 'sense']  ##Truncated empty lists  ['にぶい', 'adjective (keiyoushi)', 'dull (e.g. a knife)', 'blunt']  ['のろい is usu. in kana', 'thickheaded', 'obtuse', 'stupid']  ['にぶい', 'dull (sound, color, etc.)', 'dim (light)']  ['slow', 'sluggish', 'inert', 'lethargic']  ['のろい', 'indulgent (esp. to the opposite sex)', 'doting']    

I've been picking apart the data for a while, but have not been able to identify a cause for this, but suspect that another entry in the xml file may override what is shown.

XML fragments

  <JMdict>  <entry>  <ent_seq>1000000</ent_seq>  <r_ele>  <reb>ヽ</reb>  </r_ele>  <r_ele>  <reb>くりかえし</reb>  </r_ele>  <sense>  <pos>&n;</pos>  <gloss>repetition mark in katakana</gloss>  </sense>  </entry>  <entry>  <ent_seq>1582430</ent_seq>  <k_ele>  <keb>鈍い</keb>  <ke_pri>ichi1</ke_pri>  <ke_pri>news2</ke_pri>  <ke_pri>nf30</ke_pri>  </k_ele>  <r_ele>  <reb>にぶい</reb>  <re_pri>ichi1</re_pri>  <re_pri>news2</re_pri>  <re_pri>nf30</re_pri>  </r_ele>  <r_ele>  <reb>のろい</reb>  <re_pri>ichi1</re_pri>  </r_ele>  <sense>  <stagr>にぶい</stagr>  <pos>&adj-i;</pos>  <gloss>dull (e.g. a knife)</gloss>  <gloss>blunt</gloss>  </sense>  <sense>  <s_inf>のろい is usu. in kana</s_inf>  <gloss>thickheaded</gloss>  <gloss>obtuse</gloss>  <gloss>stupid</gloss>  </sense>  <sense>  <stagr>にぶい</stagr>  <gloss>dull (sound, color, etc.)</gloss>  <gloss>dim (light)</gloss>  </sense>  <sense>  <gloss>slow</gloss>  <gloss>sluggish</gloss>  <gloss>inert</gloss>  <gloss>lethargic</gloss>  </sense>  <sense>  <stagr>のろい</stagr>  <gloss>indulgent (esp. to the opposite sex)</gloss>  <gloss>doting</gloss>  </sense>  </entry>  </JMdict>    

XML file in question

http://ift.tt/19gQ3am

No comments:

Post a Comment