Saturday, 19 July 2014

Chatango xml parsing in python



I've been trying to find the best and the most cleanest way of parsing xml in python. On chatango there's a xml site with a user's profile information like date of birth(b tag), gender(s tag) their mini(body tag and quoted) and location(l tag). What I'm trying to do is get the text of those tags, but the issue/problem is if a user didn't fill out something in their profile, the tag and the text will not be on the xml site. So I'm trying to check if that tag is on the site and get the text, if it's not I'm gonna change it to a question mark. So the issue is I need help finding a cleaner way of doing it. I've been looking up some issues like this but didn't find anything so hopefully you guys can help. :P


Here's some of the xml sites:


This one has all the tags: http://ift.tt/1qSdZnu


And an example of one that only has some: http://ift.tt/WlA0BH


Here's a code I came up with:



import urllib.request
import urllib.parse
import datetime
from xml.etree import cElementTree as ET

class prof:

def getProf(name):
if len(name) == 1: url = "http://ift.tt/1qSdZUk"+name+"/"+name+"/"+name+"/mod1.xml"
elif len(name) > 1: url = "http://ift.tt/1qSdZUk"+name[0]+"/"+name[1]+"/"+name+"/mod1.xml"
f = urllib.request.urlopen(url)
data = f.read().decode("utf-8")
data = ET.parse(data).getroot()
today = datetime.date.today()
if data.find("s") is not None:
gender = data.find("s").text
else:
gender = "?"
if data.find("b") is not None:
age = data.find("b").text.split("-")
age = today.year - age[0] - ((today.month, today.day) < (age[1], age[2]))
else:
age = "?"
if data.find("l") is not None:
location = data.find("l").text
else:
location = "?"
if data.find("body") is not None:
mini = urllib.parse.unquote(data.find("body").text)
else:
mini = "?"
if len(mini) < 1575:
return "%s Info - Gender: %s, Age: %s, Location: %s <br/> %s" % (name, gender, age, location, mini)
else:
return "%s Info - Gender: %s, Age: %s, Location: %s <br/> Too many characters to display!" % (name, gender, age, location)

No comments:

Post a Comment