from bs4 import BeautifulSoup
import urllib2
wiki = "http://bit.ly/1mb0tYS"
header = {'User-Agent': 'Mozilla/5.0'} #Needed to prevent 403 error on Wikipedia
req = urllib2.Request(wiki,headers=header)
page = urllib2.urlopen(wiki)
soup = BeautifulSoup(page)
print soup
I used the above code, but cannot get all data in the middle of website, since they read from the xml in 300ms interval when I using firefox to read it, any way I can get the data from this website ? many thanks !
No comments:
Post a Comment