I am trying to parse a 10k line XML file with Python. The file, an excerpt of which is shown below, describes some physical properties of a long list of elements. Each element is described in the XML file by the "ble" element, and I already have Python classes written for each "ble" type I will encounter.
<header>
<!--SlotModels-->
<slotModels id="SpokeOptimusSlotModels" xml:base="spoke.xml" xmlns:xi="http://ift.tt/17F4Wja">
<slotModel id="spokeLEDP">
<var id="g1" type="double"/>
<ble id="DR0005" type="drift">
<d id="l" type="double" unit="mm">240</d>
<d id="r" type="double" unit="mm">20</d>
<d id="ry" type="double" unit="mm">0</d>
</ble>
<ble id="QD0020" model="Quad310" type="quad">
<d id="l" type="double" unit="mm">310</d>
<d id="g" type="double" unit="T/m">g1</d>
<d id="r" type="double" unit="mm">30</d>
</ble>
</slotModel>
<slotModel id="spokeLwu">
<var id="g1" type="double"/>
<ble id="DR0010" type="drift">
<d id="l" type="double" unit="mm">160</d>
<d id="r" type="double" unit="mm">30</d>
<d id="ry" type="double" unit="mm">0</d>
</ble>
<ble id="QD0020" model="Quad310" type="quad">
<d id="l" type="double" unit="mm">310</d>
<d id="g" type="double" unit="T/m">g1</d>
<d id="r" type="double" unit="mm">30</d>
</ble>
</slotModel>
<slotModel id="spokeCryomodule">
<var id="xelmax1" type="double"/>
<var id="rfpdeg1" type="double"/>
<ble id="DR0010" type="drift">
<d id="l" type="double" unit="mm">368.5</d>
<d id="r" type="double" unit="mm">28</d>
<d id="ry" type="double" unit="mm">0</d>
</ble>
<ble id="FM0020" model="spokeCavity" type="fieldMap">
<d id="rfpdeg" type="double" unit="deg">rfpdeg1</d>
<d id="xelmax" type="double" unit="unit">xelmax1</d>
<d id="radiusmm" type="double" unit="mm">28</d>
<d id="lengthmm" type="double" unit="mm">994</d>
<d id="file" type="string" unit="unit">Spoke_F2F</d>
<d id="scaleFactor" type="double" unit="unit">1.0</d>
</ble>
</slotModel>
</slotModels>
<slotModels id="medBetaSlotModels" xml:base="medBeta.xml" xmlns:xi="http://ift.tt/17F4Wja">
<slotModel id="medBetaLwu">
<var id="g1" type="double"/>
<ble id="DR0010" type="drift">
<d id="l" type="double" unit="mm">256.2</d>
<d id="r" type="double" unit="mm">50</d>
<d id="ry" type="double" unit="mm">0</d>
</ble>
<ble id="QD0020" model="Quad410" type="quad">
<d id="l" type="double" unit="mm">410</d>
<d id="g" type="double" unit="T/m">g1</d>
<d id="r" type="double" unit="mm">50</d>
</ble>
</slotModel>
<slotModel id="medBetaCryomodule">
<var id="xelmax1" type="double"/>
<var id="rfpdeg1" type="double"/>
<ble id="DR0010" type="drift">
<d id="l" type="double" unit="mm">414.4</d>
<d id="r" type="double" unit="mm">46.87</d>
<d id="ry" type="double" unit="mm">0</d>
</ble>
<ble id="FM0020" model="medBetaCavity" type="fieldMap">
<d id="rfpdeg" type="double" unit="deg">rfpdeg1</d>
<d id="xelmax" type="double" unit="unit">xelmax1</d>
<d id="radiusmm" type="double" unit="mm">46.87</d>
<d id="lengthmm" type="double" unit="mm">1258.8</d>
<d id="file" type="string" unit="unit">MB_F2F</d>
<d id="scaleFactor" type="double" unit="unit">1.0</d>
</ble>
</slotModel>
</slotModels>
<cellModels id="SpokeOptimusCellModels" xml:base="spoke.xml" xmlns:xi="http://ift.tt/17F4Wja">
<cellModel id="spokeLEDPCell">
<var id="g1" type="double"/>
<var id="xelmax1" type="double"/>
<var id="rfpdeg1" type="double"/>
<slot id="slot010" model="spokeLEDP">
<d id="g1" type="double">g1</d>
</slot>
<slot id="slot020" model="spokeCryomodule">
<d id="xelmax1" type="double">xelmax1</d>
<d id="rfpdeg1" type="double">rfpdeg1</d>
</slot>
</cellModel>
<cellModel id="spokeCell">
<var id="g1" type="double"/>
<var id="xelmax1" type="double"/>
<var id="rfpdeg1" type="double"/>
<slot id="slot010" model="spokeLwu">
<d id="g1" type="double">g1</d>
</slot>
<slot id="slot020" model="spokeCryomodule">
<d id="xelmax1" type="double">xelmax1</d>
<d id="rfpdeg1" type="double">rfpdeg1</d>
</slot>
</cellModel>
</cellModels>
<cellModels id="medBetaCellModels" xml:base="medBeta.xml" xmlns:xi="http://ift.tt/17F4Wja">
<cellModel id="medBetaCell">
<var id="g1" type="double"/>
<var id="xelmax1" type="double"/>
<var id="rfpdeg1" type="double"/>
<slot id="slot010" model="medBetaLwu">
<d id="g1" type="double">g1</d>
</slot>
<slot id="slot020" model="medBetaCryomodule">
<d id="xelmax1" type="double">xelmax1</d>
<d id="rfpdeg1" type="double">rfpdeg1</d>
</slot>
</cellModel>
</cellModels>
</header>
<linac>
<section id="SPOK" rfHarmonic="1" xml:base="spoke.xml" xmlns:xi="http://ift.tt/17F4Wja">
<cell id="cell010" model="spokeLEDPCell">
<d id="g1" type="double">5.23025</d>
<d id="g2" type="double">-4.68975</d>
<d id="xelmax1" type="double">0.868945</d>
<d id="xelmax2" type="double">0.865525</d>
<d id="rfpdeg1" type="double">-6.65943</d>
<d id="rfpdeg2" type="double">4.03247</d>
</cell>
<cell id="cell020" model="spokeCell">
<d id="g1" type="double">4.85226</d>
<d id="g2" type="double">-4.77927</d>
<d id="xelmax1" type="double">0.890626</d>
<d id="xelmax2" type="double">0.891124</d>
<d id="rfpdeg1" type="double">22.0298</d>
<d id="rfpdeg2" type="double">31.6618</d>
</cell>
<cell id="cell030" model="spokeCell">
<d id="g1" type="double">4.46164</d>
<d id="g2" type="double">-4.45154</d>
<d id="xelmax1" type="double">1</d>
<d id="xelmax2" type="double">1</d>
<d id="rfpdeg1" type="double">37.712</d>
<d id="rfpdeg2" type="double">47.397</d>
</cell>
<!--SPOKE-->
<section id="MBL" rfHarmonic="2" xml:base="medBeta.xml" xmlns:xi="http://ift.tt/17F4Wja">
<cell id="cell010" model="medBetaCell">
<d id="g1" type="double">3.39121</d>
<d id="g2" type="double">-3.31534</d>
<d id="xelmax1" type="double">0.44729</d>
<d id="xelmax2" type="double">0.4477</d>
<d id="xelmax3" type="double">0.453125</d>
<d id="xelmax4" type="double">0.453307</d>
<d id="rfpdeg1" type="double">55.8358</d>
<d id="rfpdeg2" type="double">61.7858</d>
<d id="rfpdeg3" type="double">66.1437</d>
<d id="rfpdeg4" type="double">72.2867</d>
</cell>
<cell id="cell020" model="medBetaCell">
<d id="g1" type="double">3.60124</d>
<d id="g2" type="double">-3.64339</d>
<d id="xelmax1" type="double">0.512886</d>
<d id="xelmax2" type="double">0.512886</d>
<d id="xelmax3" type="double">0.512886</d>
<d id="xelmax4" type="double">0.512886</d>
<d id="rfpdeg1" type="double">77.201</d>
<d id="rfpdeg2" type="double">84.17</d>
<d id="rfpdeg3" type="double">91.207</d>
<d id="rfpdeg4" type="double">98.296</d>
</cell>
</section>
</linac>
Note that the structure of the XML file is that the "slots" and "cells", which are basically lists of ble's with a discernable pattern, are defined in the header, while the data is kept in "linac".
What I would like to do
I would like to scan "linac", and expand each of the cells into a list of fully instantiated objects of the classes I have written to represent each of the ble's.
To do this, I would like to be able to parse the header in such a way that returns functions that can be called for each of the slotModels or cellModels. In other words, I would like to automatically generate functions something like the following:
def spokeLEDP(g1):
bleList = []
bleList.append(drift(l=240, r=20, ry=0)
bleList.append(quad(l=310, g=g1, r=30)
return bleList
def spokeLEDPcell(g1, xelmax1, rfpdeg1):
myList = []
myList.append(spokeLEDP(g1))
myList.append(spokeCryomodule(xelmax1, rfpdeg1))
return myList
I realise that I would need to flatten the list properly, but I hope you get the idea.
My plan
The only way I can currently see to proceed with dynamically generating the functions from the header, is a two step process. First, use Python to create a text file with the necessary functions. Then import this into the working code.
This seems very clunky and unwieldy.
My question
Is there a way to do what I want to do in one step, without a lot of repetitive parsing of the "cell" and "slot" elements in the XML header?
Many thanks for reading this far, and for any help you can offer.
(Note: I have no control over the structure of the XML file.)
No comments:
Post a Comment