Parsing compressed xml feed into ElementTree



I'm trying to parse the following feed into ElementTree in python: "http://ift.tt/1vSUA85" (warning large file)


Here is what I have tried so far:



feed = urllib.urlopen("http://ift.tt/1vSUA85")

# feed is compressed
compressed_data = feed.read()
import StringIO
compressedstream = StringIO.StringIO(compressed_data)
import gzip
gzipper = gzip.GzipFile(fileobj=compressedstream)
data = gzipper.read()

# Parse XML
tree = ET.parse(data)


but it seems to just hang on compressed_data = feed.read(), infinitely maybe?? (I know it's a big file, but seems too long compared to other non-compressed feeds I parsed, and this large is killing any bandwidth gains from the gzip compression in the first place).


Next I tried requests, with



url = "http://ift.tt/1vSUA85"
payload = {'accept-encoding': 'gzip, deflate'}
r = requests.get(url, params=payload, stream=True)


but now



tree=ET.parse(r.content)


or



tree=ET.parse(r.text)


but these raise exceptions.


What's the proper way to do this?


No comments:

Post a Comment