I have successfully read .docx
files using ElementTree package using zipfile
. But I realized that there isn't the archive 'word/document.xml
'for .doc
files . I looked into the docs but did not find any. How can it be read? For docx, i used :
import zipfile as zf
import xml.etree.ElementTree as ET
z = zf.ZipFile("test.docx")
doc_xml = z.open('word/document.xml')
tree = ET.parse(doc_xml)
Using the above for .doc gives :
KeyError: "There is no item named 'word/document.xml' in the archive"
I saw something for read in ElementTree docs but that is for xml files only.
doc_xml = open('yesblue.doc','r')
How should go about this one? Maybe something like converting .doc
into .docx
in python itself.
No comments:
Post a Comment