Reading a .doc extension file ,ElementTree



I have successfully read .docx files using ElementTree package using zipfile. But I realized that there isn't the archive 'word/document.xml'for .doc files . I looked into the docs but did not find any. How can it be read? For docx, i used :



import zipfile as zf
import xml.etree.ElementTree as ET
z = zf.ZipFile("test.docx")
doc_xml = z.open('word/document.xml')
tree = ET.parse(doc_xml)


Using the above for .doc gives :



KeyError: "There is no item named 'word/document.xml' in the archive"


I saw something for read in ElementTree docs but that is for xml files only.



doc_xml = open('yesblue.doc','r')


How should go about this one? Maybe something like converting .doc into .docx in python itself.


No comments:

Post a Comment