Thursday, 17 July 2014

Using etree to break down an xml file into managable files. Python



I am currently trying to use python to create 3 files from one xml file.


There are three types of data in the xml file, estates, symbol names and tick types.


I want 3 text files, listing the 3 different things.


This is currently my code, and it lists the estates absolutely fine:



from xml.dom import minidom

#xmldoc = minidom.parse('\\\\fmgdata1\data01\SMDS_Public\Symbols\Symbols.xml')

#Define the xmldoc object
xmldoc = minidom.parse('C:\\Temp\\Symbols.xml')

#Define EstateList by getting Elements by tag name
EstateList = xmldoc.getElementsByTagName('Estate')
#Print Estate List
print "There are currently %d data estates" % len(EstateList)
#print EstateList[0].attributes['EstateName'].value
for s in EstateList:
print s.attributes['EstateName'].value

#Save Estate List to file
with open('dataestates.txt', 'w') as f:
f.write("There are currently %d data estates \n" % len(EstateList))
for s in EstateList:
f.write(s.attributes['EstateName'].value + "\n")


However, when I start looking at the other ones, symbol names and tick types I can't get anything to work, I can't get close to it listing tick types, I've tried attributes, tags, all sorts.


Here is an example of the xml code



<Estates>
<Estate EstateName="BBG.DL.BOND.RAW._LIVE">
<Ticktype>BBG_BGN</Ticktype>
<Ticktype>BBG_BVAL</Ticktype>
<Ticktype>BBG_CBBT</Ticktype>
<Ticktype>BBG_IXEP</Ticktype>
<Ticktype>BBG_IXSP</Ticktype>
<Ticktype>BBG_TRAC</Ticktype>
<Ticktype>BBG</Ticktype>
</Estate>
<Estate EstateName="BBG.DL.CCY.RAW._LIVE">
<Ticktype>BBG</Ticktype>
</Estate>
</Estates>
<Symbols>
<Symbol SymbolName="AT0000386073 Corp" Estate="BBG.DL.BOND.RAW._LIVE" TickType="BBG_BGN" />
<Symbol SymbolName="AT0000386073 Corp" Estate="BBG.DL.BOND.RAW._LIVE" TickType="BBG_BVAL" />
</Symbols>

No comments:

Post a Comment