Opening BBC XML file in Python using urllub2 and lxml gives me 406 error



I am extremely new to XML and have what I hope is a simple question. I am trying to open an XML file at a location given to me by the BBC. If I click on the link using Firefox sure enough I get a page of XML.


But using Python 2.7.6 under Ubuntu 14.04 if I attempt the following fragment of code



import urllib2

xmldoc="http://ift.tt/1BPAzTV"

u = urllib2.urlopen(xmldoc)


I get:



Traceback (most recent call last):
File "/home/tim/metatron/Projects/R4/tp.py", line 7, in <module>
u = urllib2.urlopen(xmldoc)
File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 410, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 523, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 448, in error
return self._call_chain(*args)
File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 531, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 406: Not Acceptable


I've never come across a 406 error before. I would note that the URI I have ben given by the BBC differs from others which usually end .../something.xml.


What am I doing wrong? Specifically why can Firefox open the file but not Python? IS there some sort of default xml file which would be loaded (analogous to index.html)


Similarly if I do a root=lxml.etree.parse(xmldoc) I get



Traceback (most recent call last):
File "/home/tim/metatron/Projects/R4/tp.py", line 8, in <module>
root=lxml.etree.parse(xmldoc)
File "lxml.etree.pyx", line 3239, in lxml.etree.parse (src/lxml/lxml.etree.c:69955)
File "parser.pxi", line 1748, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:102066)
File "parser.pxi", line 1774, in lxml.etree._parseDocumentFromURL (src/lxml/lxml.etree.c:102\330)
File "parser.pxi", line 1678, in lxml.etree._parseDocFromFile (src/lxml/lxml.etree.c:101365)
File "parser.pxi", line 1110, in lxml.etree._BaseParser._parseDocFromFile (src/lxml/lxml.etr\
ee.c:96817)
File "parser.pxi", line 582, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lx\
ml.etree.c:91275)
File "parser.pxi", line 683, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:92461)
File "parser.pxi", line 620, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:91722)
IOError: Error reading file 'http://ift.tt/1uKcr2B\
rand/bbc_radio_four/': failed to load HTTP resource

No comments:

Post a Comment