I noticed the xml entities " will automatically force to convert to their real original characters:
>>> from lxml import etree as et
>>> parser = et.XMLParser()
>>> xml = et.fromstring("<root><elem>"hello world"</elem></root>", parser)
>>> print et.tostring(xml, pretty_print=1)
<root>
<elem>"hello world"</elem>
</root>
>>>
I found one related old(2009-02-07) thread:
s = cStringIO.StringIO(""""She's the MAN!"""") e = etree.parse(s,etree.XMLParser(resolve_entities=False))
Note that there's also etree.fromstring().
etree.tostring(e) '"She\'s the MAN!"'
I would have expected resolve_entities=False to have prevented the translation of, eg, " to ".
The "resolve_entities" option is meant for entities defined in a DTD of which you want to keep the reference instead of the resolved value. The entities you mention are part of the XML spec, not of a DTD.
is there another way to prevent this behavior (or, if nothing else, reverse it after the fact)?
Well, what you get is well-formed XML. May I ask why you need the entity references in the output?
Still, the response is why you want to do that, there's no direct answer to this problem. I'm quite surprised because the etree parser force the conversion without giving an option to disable it.
The following example shown why i need this solution, this xml is for xbmc skinning parser:
>>> print open("/tmp/so.xml").read() #the original file
<window id="1234">
<defaultcontrol>101</defaultcontrol>
<controls>
<control type="button" id="101">
<onfocus>Dialog.Close(212)</onfocus>
<onfocus>SetFocus(11)</onfocus>
</control>
<control type="button" id="102">
<visible>StringCompare(VideoPlayer.PlotOutline,Stream.IsPlaying) + !Skin.HasSetting(Stream.IsUpdated)</visible>
<onfocus>RunScript(script.test)</onfocus>
<onfocus>SetFocus(11)</onfocus>
</control>
<control type="button" id="103">
<visible>SubString(VideoPlayer.PlotOutline,Video.IsPlaying)</visible>
<onfocus>Close</onfocus>
<onfocus>RunScript("/.xbmc/addons/script.hello.world/default.py","$INFO[VideoPlayer.Album]","$INFO[VideoPlayer.Genre]")</onfocus>
</control>
</controls>
</window>
>>> root = et.parse("/tmp/so.xml", parser)
>>> r = root.getroot()
>>> for c in r:
... for cc in c:
... if cc.attrib.get('id') == "103":
... cc.remove(cc[1]) #remove 1 element, it's just a demonstrate
...
>>> o = open("/tmp/so.xml", "w")
>>> o.write(et.tostring(r, pretty_print=1)) #save it back
>>> o.close()
>>> print open("/tmp/so.xml").read() #the file after implemented
<window id="1234">
<defaultcontrol>101</defaultcontrol>
<controls>
<control type="button" id="101">
<onfocus>Dialog.Close(212)</onfocus>
<onfocus>SetFocus(11)</onfocus>
</control>
<control type="button" id="102">
<visible>StringCompare(VideoPlayer.PlotOutline,Stream.IsPlaying) + !Skin.HasSetting(Stream.IsUpdated)</visible>
<onfocus>RunScript(script.test)</onfocus>
<onfocus>SetFocus(11)</onfocus>
</control>
<control type="button" id="103">
<visible>SubString(VideoPlayer.PlotOutline,Video.IsPlaying)</visible>
<onfocus>RunScript("/.xbmc/addons/script.hello.world/default.py","$INFO[VideoPlayer.Album]","$INFO[VideoPlayer.Genre]")</onfocus>
</control>
</controls>
</window>
>>>
As you can see of the onfocus element under id "103" at the end, the " are no longer in their original form, and it lead to bug if the "$INFO[VideoPlayer.Album]" variable contains nested quotes and become ""test"" which was invalid and error.
So is it any hacky way i can keep " in their original form ?
No comments:
Post a Comment