XML : Parsing RSS resources without any output

This might be a really beginner question but I dont get an error so I dont know whats going on.

This is my code:

  # -*- coding: utf-8 -*-  import urllib2  from urllib2 import urlopen  import re  import cookielib  from cookielib import CookieJar  import time      cj = CookieJar()  opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))  opener.addheaders = [('User-agent', 'Mozilla/5.0')]    def main():      with open('word_list.txt') as f:          word_list = f.readlines()        try:          pages = open('rss_sources.txt').readlines()          for rss_resource in pages:              sourceCode = opener.open(rss_resource).read()          #print sourceCode            try:              titles = re.findall(r'<title>(.*?)</title>', sourceCode)                for title in titles:                  if any(word.lower() in title.lower() for word in word_list):                      print title            except Exception, e:              print str(e)        except Exception, e:          print str(e)    main()

My example RSS sources are:

http://www.finanzen.de/news/feed http://www.welt.de/wirtschaft/?service=Rss

Issues: The first RSS source is fine and it will print me out the titles that contain the keywords from word_list.txt. Now once I add the second RSS source to the .txt file my output is nothing, there is no errormessage or anything. Not even the first rss resource gives me anything.

Is there a problem with the second resource? How would I handle that error? And why isnt the first resource parsed correctly?

Please point me in the right direction so I can take care of this :)

XML : Parsing RSS resources without any output

No comments:

Post a Comment