This question already has an answer here:
I'm trying to print out portions of an xml document for documentation purpose in a bigger program.
In these prints, I do not need any namespace attributes and I would like to remove them for better readability.
I started with:
from lxml import etree, objectify
xmlstring = """
<root ns="http://ns.dettorer.net" xmlns="http://ift.tt/1u1q0X5" xmlns:scp="http://ift.tt/1u1q2OK">
<tag att='foo' att2='bar' />
<tag att='bar' att2='foo' />
</root>
"""
xml = etree.XML(xmlstring)
for sub_xml in xml:
print(etree.tounicode(sub_xml))
which gives:
<tag xmlns="http://ift.tt/1u1q0X5" xmlns:scp="http://ift.tt/1u1q2OK" att="foo" att2="bar"/>
<tag xmlns="http://ift.tt/1u1q0X5" xmlns:scp="http://ift.tt/1u1q2OK" att="bar" att2="foo"/>
I've found lxml.objectify.deannotate in this question which almost gives what I want:
working_xml = copy.deepcopy(xml)
objectify.deannotate(working_xml, cleanup_namespaces=True)
for sub_xml in working_xml:
print(etree.tounicode(sub_xml))
But the xmlns attribute is still there:
<tag xmlns="http://ift.tt/1u1q0X5" att="foo" att2="bar"/>
<tag xmlns="http://ift.tt/1u1q0X5" att="bar" att2="foo"/>
I tried to remove it with etree.strip_attributes(sub_xml, 'xmlns') (which simply doesn't remove it), sub_xml.attrib.pop('xmlns') and del sub_xml.attrib['xmlns'] (which both raise a KeyError). I guess it's because xmlns is not actually an attribute of my elements.
Is there a way to either tell lxml.objectify.deannotate to also remove that xmlns inherited attribute or to remove it after that? I've also considered using a regexp to remove the attribute after the .tounicode conversion, but I'm seeking a cleaner and more sensible way.
I also cannot make a deep copy of the whole xml document (and then remove the xmlns attribute in the root node) since the document and my program structure are much bigger than this example.
No comments:
Post a Comment