python remove non tags in xml



I want to remove all content, which is not in xml tags (cleanup) and optionally put it in a list. I got some xml like this:



<tag>some text</tag> unwanted text <tag>some text</tag>


and I want to get this with python (regex)



('<tag>some text</tag>','<tag>some text</tag>')


I tried it with:



cleanup = re.findall(r"^<.>.*</.>$", input)


but I think the whole input matches also the regex how can i fix this ?


Update1:


i try to load it with



import xml.etree.ElementTree as ET
root = ET.fromstring(str(cleanup))

No comments:

Post a Comment