I am using lxml to check Product elements as they stream in a MapReduce job. I am trying to make sure that only the correct xmlns value is present in every element. For example, every Product element should have an xmlns set to "http://mynetwork.products.com/new":
<Product xmlns="http://mynetwork.products.com/new">
As I check each Product element (streamed one at a time), I just want to make sure that it looks like the above. I want to check for the following potential errors:
- Incorrect xmlns URL:
<Product xmlns="http://mynetwork.products.com/old">
- Missing URL
<Product xmlns="">
- Missing xmlns key/value pair
<Product>
- Extra attribution in the Product element
<Product xmlns="http://mynetwork.products.com/new" something="else">
I tried storing the value of Product.nsmap for each element (which is a dictionary) and then reading the values of the dictionary to validate, but it doesn't help me detect any of the below cases. There must be a way.
No comments:
Post a Comment