I want to remove all content, which is not in xml tags (cleanup) and optionally put it in a list. I got some xml like this:
<tag>some text</tag> unwanted text <tag>some text</tag>
and I want to get this with python (regex)
('<tag>some text</tag>','<tag>some text</tag>')
I tried it with:
cleanup = re.findall(r"^<.>.*</.>$", input)
but I think the whole input matches also the regex how can i fix this ?
Update1:
i try to load it with
import xml.etree.ElementTree as ET
root = ET.fromstring(str(cleanup))
No comments:
Post a Comment