XML : How to check xmlns in every element using lxml

I am using lxml to check Product elements as they stream in a MapReduce job. I am trying to make sure that only the correct xmlns value is present in every element. For example, every Product element should have an xmlns set to "http://mynetwork.products.com/new":

<Product xmlns="http://mynetwork.products.com/new">

As I check each Product element (streamed one at a time), I just want to make sure that it looks like the above. I want to check for the following potential errors:

  1. Incorrect xmlns URL:

<Product xmlns="http://mynetwork.products.com/old">

  1. Missing URL

<Product xmlns="">

  1. Missing xmlns key/value pair

<Product>

  1. Extra attribution in the Product element

<Product xmlns="http://mynetwork.products.com/new" something="else">

I tried storing the value of Product.nsmap for each element (which is a dictionary) and then reading the values of the dictionary to validate, but it doesn't help me detect any of the below cases. There must be a way.

No comments:

Post a Comment