I'm trying to parse the following feed into ElementTree in python: "http://ift.tt/1vSUA85" (warning large file)
Here is what I have tried so far:
feed = urllib.urlopen("http://ift.tt/1vSUA85")
# feed is compressed
compressed_data = feed.read()
import StringIO
compressedstream = StringIO.StringIO(compressed_data)
import gzip
gzipper = gzip.GzipFile(fileobj=compressedstream)
data = gzipper.read()
# Parse XML
tree = ET.parse(data)
but it seems to just hang on compressed_data = feed.read(), infinitely maybe?? (I know it's a big file, but seems too long compared to other non-compressed feeds I parsed, and this large is killing any bandwidth gains from the gzip compression in the first place).
Next I tried requests, with
url = "http://ift.tt/1vSUA85"
payload = {'accept-encoding': 'gzip, deflate'}
r = requests.get(url, params=payload, stream=True)
but now
tree=ET.parse(r.content)
or
tree=ET.parse(r.text)
but these raise exceptions.
What's the proper way to do this?
No comments:
Post a Comment