I need to scrape the very little piece of text which Google returns to any enquiry as part of the "Knowledge Graph" result (the one generally on the right-hand side) which it gets from Wikipedia. This way I can then convert the plain-text to Voice Answer. Using Simple HTML Dom I have no problems scraping such info from Bing or Ask, but the very DIV (and SPAN) within which this result is nested on Google, I just can't get it. Simple function below:
$question = str_replace(' ','+',$_GET['question']);
$address = 'http://ift.tt/1kVmble'.$question;
$ret = scraping_Google($address);
function scraping_Google($url) {
// create HTML DOM
$html = file_get_html($url);
// get title
$ret = $html->find('div.kno-rdesc', 0)->plaintext;
// clean up memory
$html->clear();
unset($html);
return $ret;
}
echo $ret;
The very div.kno-rdesc is where the content is nested (this I easily retrieve using Code Inspector on Chrome). Yet, no success to parse this tiny piece of information. Anybody able to help out? Cheers!
No comments:
Post a Comment