Parsing compressed xml feed into ElementTree

I'm trying to parse the following feed into ElementTree in python: "http://ift.tt/1vSUA85" (warning large file)

Here is what I have tried so far:


feed = urllib.urlopen("http://ift.tt/1vSUA85")

# feed is compressed
compressed_data = feed.read()
import StringIO
compressedstream = StringIO.StringIO(compressed_data)
import gzip
gzipper = gzip.GzipFile(fileobj=compressedstream)
data = gzipper.read()

# Parse XML
tree = ET.parse(data)

but it seems to just hang on compressed_data = feed.read(), infinitely maybe?? (I know it's a big file, but seems too long compared to other non-compressed feeds I parsed, and this large is killing any bandwidth gains from the gzip compression in the first place).

Next I tried requests, with


url = "http://ift.tt/1vSUA85"
payload = {'accept-encoding': 'gzip, deflate'}
r = requests.get(url, params=payload, stream=True)

but now


tree=ET.parse(r.content)


tree=ET.parse(r.text)

but these raise exceptions.

What's the proper way to do this?

Parsing compressed xml feed into ElementTree

No comments:

Post a Comment