Wednesday, 7 January 2015

How do I do thread-safe python XML validation?



Using Python 3.3, I need to validate XML documents against their DTDs or XSDs, and I expect to validate many documents against each specification. I will have a multi-threaded application performing the validation. lxml documentation explains how to validate against each specification type.


Lxml records validation errors in an array on the specification itself, therefore I will need a new copy of the specification for each validation I perform.


It is not possible (thread-safe) to re-parse the DTD specification each time because my DTD includes other files, and I have found it necessary to change directories to the folder containing the DTD files to get lxml to find them. I cannot do a thread-safe cd, so I read all the specs at application launch.


It is not desirable to re-parse either (XSD or DTD) specification because it takes I/O time and parsing time.


My attempts at copy and deepcopy of the spec (DTD and XMLSchema objects) failed outright.


Is there a way to get lxml to validate safely? Is there a better library to use that will support both XSD and DTD and let me check errors thread-safe?


No comments:

Post a Comment