XML : Parse multiple xml with the same input file with python

I have the input file as below going upto 100k Records in a SINGLE file

  <pain001><CstmrCdtTrfInitn><GrpHdr><MsgId>ABC/120928/CCT001</MsgId><CreDtTm>2012-09-28T14:07:00</CreDtTm><NbOfTxs>100000</NbOfTxs><CtrlSum>11500000</CtrlSum> <InitgPty><Nm>ABC Corporation</Nm><PstlAdr><StrtNm>Times Square</StrtNm><BldgNb>7</BldgNb><PstCd>NY 10036</PstCd><TwnNm>New York</TwnNm><Ctry>US</Ctry></PstlAdr></InitgPty></GrpHdr><PmtInf><PmtInfId>CARCORP/086</PmtInfId><PmtMtd>TRF</PmtMtd><BtchBookg>false</BtchBookg><ReqdExctnDt>2012-09-29</ReqdExctnDt><Dbtr><Nm>CARCORP INC</Nm><PstlAdr><StrtNm>Times Square</StrtNm><BldgNb>7</BldgNb><PstCd>NY 10036</PstCd><TwnNm>New York</TwnNm><Ctry>US</Ctry></PstlAdr></Dbtr><DbtrAcct><Id><Othr><Id>00125574999</Id></Othr></Id></DbtrAcct><DbtrAgt><FinInstnId><BICFI>BBBBUS33</BICFI></FinInstnId></DbtrAgt><CdtTrfTxInf><PmtId><InstrId>ABC/120928/CCT001/01</InstrId><EndToEndId>ABC/4562/4</EndToEndId></PmtId><Amt><InstdAmt Ccy="JPY">100</InstdAmt></Amt><ChrgBr>SHAR</ChrgBr><CdtrAgt><FinInstnId><BICFI>AAAAGB2L</BICFI></FinInstnId></CdtrAgt><Cdtr><Nm>DEF Electronics</Nm><PstlAdr><AdrLine>Corn Exchange 5th Floor</AdrLine><AdrLine>Mark Lane 55</AdrLine><AdrLine>EC3R7NE London</AdrLine><AdrLine>GB</AdrLine></PstlAdr></Cdtr><CdtrAcct><Id><Othr><Id>23683707994125</Id></Othr></Id></CdtrAcct><Purp><Cd>GDDS</Cd></Purp><RmtInf><Strd><RfrdDocInf><Tp><CdOrPrtry><Cd>CINV</Cd></CdOrPrtry></Tp><Nb>4562</Nb><RltdDt>2012-09-08</RltdDt></RfrdDocInf></Strd></RmtInf></CdtTrfTxInf></PmtInf></CstmrCdtTrfInitn></pain001>  <pain001><CstmrCdtTrfInitn><GrpHdr><MsgId>ABC/120928/CCT001</MsgId><CreDtTm>2012-09-28T14:07:00</CreDtTm><NbOfTxs>100000</NbOfTxs><CtrlSum>11500000</CtrlSum> <InitgPty><Nm>ABC Corporation</Nm><PstlAdr><StrtNm>Times Square</StrtNm><BldgNb>7</BldgNb><PstCd>NY 10036</PstCd><TwnNm>New York</TwnNm><Ctry>US</Ctry></PstlAdr></InitgPty></GrpHdr><PmtInf><PmtInfId>CARCORP/086</PmtInfId><PmtMtd>TRF</PmtMtd><BtchBookg>false</BtchBookg><ReqdExctnDt>2012-09-29</ReqdExctnDt><Dbtr><Nm>CARCORP INC</Nm><PstlAdr><StrtNm>Times Square</StrtNm><BldgNb>7</BldgNb><PstCd>NY 10036</PstCd><TwnNm>New York</TwnNm><Ctry>US</Ctry></PstlAdr></Dbtr><DbtrAcct><Id><Othr><Id>00125574999</Id></Othr></Id></DbtrAcct><DbtrAgt><FinInstnId><BICFI>BBBBUS33</BICFI></FinInstnId></DbtrAgt><CdtTrfTxInf><PmtId><InstrId>ABC/120928/CCT001/01</InstrId><EndToEndId>ABC/4562/4</EndToEndId></PmtId><Amt><InstdAmt Ccy="JPY">100</InstdAmt></Amt><ChrgBr>SHAR</ChrgBr><CdtrAgt><FinInstnId><BICFI>AAAAGB2L</BICFI></FinInstnId></CdtrAgt><Cdtr><Nm>DEF Electronics</Nm><PstlAdr><AdrLine>Corn Exchange 5th Floor</AdrLine><AdrLine>Mark Lane 55</AdrLine><AdrLine>EC3R7NE London</AdrLine><AdrLine>GB</AdrLine></PstlAdr></Cdtr><CdtrAcct><Id><Othr><Id>23683707994125</Id></Othr></Id></CdtrAcct><Purp><Cd>GDDS</Cd></Purp><RmtInf><Strd><RfrdDocInf><Tp><CdOrPrtry><Cd>CINV</Cd></CdOrPrtry></Tp><Nb>4562</Nb><RltdDt>2012-09-08</RltdDt></RfrdDocInf></Strd></RmtInf></CdtTrfTxInf></PmtInf></CstmrCdtTrfInitn></pain001>    

I have used list comprehension with Xpath as my logic to parse the value

  def parsexml():   net=[]   tree = ET.parse('pain1.xml')   root = tree.getroot()         grp1x = [e.text for e in root.findall('CstmrCdtTrfInitn/GrpHdr/MsgId')]   grp1y = [e.text for e in root.findall('CstmrCdtTrfInitn/GrpHdr/CreDtTm')]   grp1 = [e.text for e in root.findall('CstmrCdtTrfInitn/GrpHdr/InitgPty/Nm')]   grp2 = [e.text for e in root.findall('CstmrCdtTrfInitn/GrpHdr/CreDtTm')]   grp3 = [e.text for e in root.findall('CstmrCdtTrfInitn/GrpHdr/NbOfTxs')]   grp4 = [e.text for e in root.findall('CstmrCdtTrfInitn/GrpHdr/CtrlSum')]   grp5 = [e.text for e in root.findall('CstmrCdtTrfInitn/GrpHdr/InitgPty/PstlAdr/StrtNm')]   grp6 = [e.text for e in root.findall('CstmrCdtTrfInitn/GrpHdr/InitgPty/PstlAdr/BldgNb')]   grp7 = [e.text for e in root.findall('CstmrCdtTrfInitn/GrpHdr/InitgPty/PstlAdr/PstCd')]   grp8 = [e.text for e in root.findall('CstmrCdtTrfInitn/GrpHdr/InitgPty/PstlAdr/TwnNm')]   grp9 = [e.text for e in root.findall('CstmrCdtTrfInitn/GrpHdr/InitgPty/PstlAdr/Ctry')]   grp10 = [e.text for e in root.findall('CstmrCdtTrfInitn/PmtInf/PmtInfId')]   grp11 = [e.text for e in root.findall('CstmrCdtTrfInitn/PmtInf/PmtMtd')]   grp12 = [e.text for e in root.findall('CstmrCdtTrfInitn/PmtInf/BtchBookg')]   grp13 = [e.text for e in root.findall('CstmrCdtTrfInitn/PmtInf/ReqdExctnDt')]   grp14 = [e.text for e in root.findall('CstmrCdtTrfInitn/PmtInf/Dbtr/Nm')]   grp15 = [e.text for e in root.findall('CstmrCdtTrfInitn/PmtInf/Dbtr/PstlAdr/StrtNm')]   grp16 = [e.text for e in root.findall('CstmrCdtTrfInitn/PmtInf/Dbtr/PstlAdr/BldgNb')]   grp17 = [e.text for e in root.findall('CstmrCdtTrfInitn/PmtInf/Dbtr/PstlAdr/PstCd')]   grp18 = [e.text for e in root.findall('CstmrCdtTrfInitn/PmtInf/Dbtr/PstlAdr/TwnNm')]   grp19 = [e.text for e in root.findall('CstmrCdtTrfInitn/PmtInf/Dbtr/PstlAdr/Ctry')]   grp20 = [e.text for e in root.findall('CstmrCdtTrfInitn/PmtInf/DbtrAcct/Id/Othr/Id')]   grp21 = [e.text for e in root.findall('CstmrCdtTrfInitn/PmtInf/DbtrAgt/FinInstnId/BICFI')]   grp22 = [e.text for e in root.findall('CstmrCdtTrfInitn/PmtInf/CdtTrfTxInf/PmtId/InstrId')]   grp23 = [e.text for e in root.findall('CstmrCdtTrfInitn/PmtInf/CdtTrfTxInf/PmtId/EndToEndId')]   grp24 = [e.text for e in root.findall('CstmrCdtTrfInitn/PmtInf/CdtTrfTxInf/Amt/InstdAmt')]   grp25= [e.text for e in root.findall('CstmrCdtTrfInitn/PmtInf/CdtTrfTxInf/Amt/InstdAmt[@Ccy="JPY"]')]   grp26 = [e.text for e in root.findall('CstmrCdtTrfInitn/PmtInf/CdtTrfTxInf/ChrgBr')]   grp27 = [e.text for e in root.findall('CstmrCdtTrfInitn/PmtInf/CdtTrfTxInf/CdtrAgt/FinInstnId/BICFI')]   grp28 = [e.text for e in root.findall('CstmrCdtTrfInitn/PmtInf/CdtTrfTxInf/Cdtr/Nm')]   grp29 = [e.text for e in root.findall('CstmrCdtTrfInitn/PmtInf/CdtTrfTxInf/Cdtr/PstlAdr/AdrLine[1]')]   grp30 = [e.text for e in root.findall('CstmrCdtTrfInitn/PmtInf/CdtTrfTxInf/Cdtr/PstlAdr/AdrLine[2]')]   grp31 = [e.text for e in root.findall('CstmrCdtTrfInitn/PmtInf/CdtTrfTxInf/Cdtr/PstlAdr/AdrLine[3]')]   grp32 = [e.text for e in root.findall('CstmrCdtTrfInitn/PmtInf/CdtTrfTxInf/Cdtr/PstlAdr/AdrLine[4]')]   grp33 = [e.text for e in root.findall('pain001/CstmrCdtTrfInitn/PmtInf/CdtTrfTxInf/CdtrAcct/Id/Othr/Id')]   grp34 = [e.text for e in root.findall('pain001/CstmrCdtTrfInitn/PmtInf/CdtTrfTxInf/Purp/Cd')]   grp35 = [e.text for e in root.findall('CstmrCdtTrfInitn/PmtInf/CdtTrfTxInf/RmtInf/Strd/RfrdDocInf/Tp/CdOrPrtry/Cd')]   grp36 = [e.text for e in root.findall('CstmrCdtTrfInitn/PmtInf/CdtTrfTxInf/RmtInf/Strd/RfrdDocInf/Nb')]   grp37= [e.text for e in root.findall('CstmrCdtTrfInitn/PmtInf/CdtTrfTxInf/RmtInf/Strd/RfrdDocInf/RltdDt')]        net = ",".join(grp1x+grp1y+grp1 + grp2 + grp3 + grp4 +grp5+grp6+grp7+grp8+grp9+grp10+grp11+grp12+grp13+grp14+grp15+grp16+grp17+grp18+grp19+grp20+grp21+grp22+grp23+grp24+grp25+grp26+grp27+grp28+grp29+grp30+grp31+grp32+grp33+grp34+grp35+grp36+grp37)   return net     

I am getting error below

  Traceback (most recent call last):    File "C:\Python27\parsefunc.py", line 10, in <module>      tree = ET.parse('pain1.xml')    File "C:\Python27\lib\xml\etree\ElementTree.py", line 1182, in parse      tree.parse(source, parser)    File "C:\Python27\lib\xml\etree\ElementTree.py", line 656, in parse      parser.feed(data)    File "C:\Python27\lib\xml\etree\ElementTree.py", line 1642, in feed      self._raiseerror(v)    File "C:\Python27\lib\xml\etree\ElementTree.py", line 1506, in _raiseerror      raise err  xml.etree.ElementTree.ParseError: junk after document element: line 2, column 0    

The output which I need is after parsing is shown below

  ABC/120928/CCT001,2012-09-28T14:07:00,ABC Corporation,2012-09-28T14:07:00,100000,11500000,Times Square,7,NY 10036,New York,US,CARCORP/086,TRF,false,2012-09-29,CARCORP INC,Times Square,7,NY 10036,New York,US,00125574999,BBBBUS33,ABC/120928/CCT001/01,ABC/4562/1,100,100,SHAR,AAAAGB2L,DEF Electronics,Corn Exchange 5th Floor,Mark Lane 55,EC3R7NE London,GB,CINV,4562,2012-09-08  ABC/120928/CCT001,2012-09-28T14:07:00,ABC Corporation,2012-09-28T14:07:00,100000,11500000,Times Square,7,NY 10036,New York,US,CARCORP/086,TRF,false,2012-09-29,CARCORP INC,Times Square,7,NY 10036,New York,US,00125574999,BBBBUS33,ABC/120928/CCT001/01,ABC/4562/1,100,100,SHAR,AAAAGB2L,DEF Electronics,Corn Exchange 5th Floor,Mark Lane 55,EC3R7NE London,GB,CINV,4562,2012-09-08    

Is there a better approach than List Comprehension with element tree or how can I parse and get the output in the above manner to parse the other xml in the same file

No comments:

Post a Comment