XML : scrapy response.xpath returns empty array on xml document with default namespace, while response.re works

I am new to scrappy and I was playing with the scrapy shell trying to crawl this site: www.spiegel.de/sitemap.xml

I did it with

  scrapy shell "http://www.spiegel.de/sitemap.xml"    

and it works all fine, when i use

  response.body     

i can see the whole page including xml tags

however for instance this:

  response.xpath('//loc')     

simply wont work.

The result i get is an empty array

while

  response.selector.re('somevalidregexpexpression')     

would work

any idea what could be the reason? could be related to encoding or so? the site is not utf-8

I am using python 2.7 on Win 7. I tried the xpath() on another site (dmoz) and it worked fine.

No comments:

Post a Comment