I'm pulling in data from a database and attempting to create an XML file from this data. The data is in UTF-8 and can contain characters such as á, š, or č. This is the code:
import xml.etree.cElementTree as ET
tree = ET.parse(metadata_file)
# ..some commands that alter the XML..
tree.write(metadata_file, encoding="UTF-8")
When writing the data, the script fails with:
Traceback (most recent call last):
File "get-data.py", line 306, in <module>
main()
File "get-data.py", line 303, in main
tree.write(metadata_file, encoding="UTF-8")
File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 820, in write
serialize(write, self._root, encoding, qnames, namespaces)
File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 939, in _serialize_xml
_serialize_xml(write, e, encoding, qnames, None)
File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 939, in _serialize_xml
_serialize_xml(write, e, encoding, qnames, None)
File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 937, in _serialize_xml
write(_escape_cdata(text, encoding))
File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1073, in _escape_cdata
return text.encode(encoding, "xmlcharrefreplace")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 32: ordinal not in range(128)
The only way to prevent this is to decode the data written to the XML file with:
text = text.decode('utf-8')
but then the resulting file will contain e.g. č rather than a č. Any idea how I can write the data to the file and keep it in UTF-8?
No comments:
Post a Comment