Saturday, 27 September 2014

Converting XML file into CSV



I have an XML file which looks like this:



<Organism>
<Name>Bacillus halodurans C-125</Name>
<Enzyme>M.BhaII</Enzyme>
<Motif>GGCC</Motif>
<Enzyme>M1.BhaI</Enzyme>
<Motif>GCATC</Motif>
<Enzyme>M2.BhaI</Enzyme>
<Motif>GCATC</Motif>
</Organism>
<Organism>
<Name>Bacteroides eggerthii 1_2_48FAA</Name>
</Organism>


Im trying to write it into a CSV file like this:



Bacillus halodurans, GGCC
Bacillus halodurans, GCATC
Bacillus halodurans, GCATC
Bacteriodes,


The way i approached this is to create a list of tuples which will have the organism name and the motif together. I tried this using the ElementTree module



import xml.etree.ElementTree as ET

tree = ET.parse('file.xml')
rebase = tree.getroot()

list = []

for organisms in rebase.findall('Organism'):
name = organisms.find('Name').text
for each_organism in organisms.findall('Motif'):
try:
motif = organisms.find('Motif').text
print name, motif
except AttributeError:
print name


However the output i get looks like this:


Bacillus halodurans, GGCC Bacillus halodurans, GGCC Bacillus halodurans, GGCC


Only the first motif gets recorded. This is my first time working with ElementTree so its slightly confusing. Any help will be greatly appreciated


I dont need help with writing to a CSV file.


No comments:

Post a Comment