If have a XML document like this :
<!-- Location --> <w:t>Lokacioni:</w:t> <w:t>Kucni:</w:t> <w:t>Extension:</w:t> <w:t>Hajvali –Prishtinë</w:t> <w:t>Rr. " Dëshmorët e Gollakut "</w:t> <w:t>P. N. Prishtinë</w:t> <!-- Date --> <w:t>Dat:</w:t> <w:t>Datum:</w:t> <w:t>Date:</w:t> <w:t xml:space="preserve"> </w:t> <!-- Free text - contains time and description--> <w:t>1.</w:t><w:t>08:05 Aksident trafiku me dëme materiale Audi dhe Kombi te Kisha Graqanic</w:t> <!-- Checkboxes - 1 means it is checked --> <w:t>Informuar:PK</w:t><w:checkBox><w:sizeAuto/><w:default w:val="1"/></w:checkBox> <w:t>SHME</w:t><w:checkBox><w:sizeAuto/><w:default w:val="0"/></w:checkBox> <w:t>SHZSH</w:t><w:checkBox><w:sizeAuto/><w:default w:val="0"/></w:checkBox> <w:t>,Shërbimet tjera</w:t><w:checkBox><w:sizeAuto/><w:default w:val="0"/></w:checkBox>
In python I want to select values from that xml that is generated from a .docx document, that contain checkbox. I wrote code like this:
WordNameSpace = '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}' para_tag = WordNameSpace + 'p' text_tag = WordNameSpace + 't' checkBox_tag = WordNameSpace + 'checkBox' def get_docx_text(path): document = zipfile.ZipFile(path) xml_content = document.read('word/document.xml') document.close() tree = XML(xml_content) paragraphs = [] for paragraph in tree.getiterator(checkBox_tag): texts = [node.text for node in paragraph.getiterator(text_tag) if node.text] if texts: paragraphs.append(''.join(texts)) return paragraphs results = get_docx_text('test.docx') print results
when i print results variable, result is only []
? Why is this happening?
No comments:
Post a Comment