Python crawler (BeautifulSoup) cannot get javascript streaming data from html website


from bs4 import BeautifulSoup

import urllib2

wiki = "http://bit.ly/1mb0tYS"

header = {'User-Agent': 'Mozilla/5.0'} #Needed to prevent 403 error on Wikipedia

req = urllib2.Request(wiki,headers=header)

page = urllib2.urlopen(wiki)

soup = BeautifulSoup(page)

print soup

I used the above code, but cannot get all data in the middle of website, since they read from the xml in 300ms interval when I using firefox to read it, any way I can get the data from this website ? many thanks !

Python crawler (BeautifulSoup) cannot get javascript streaming data from html website

No comments:

Post a Comment