Get image from RSS Feeds with no image URL



I would just like to to know how other developers manage to properly get/extract the first image in the blog main content of a site from URL in the RSS feed. This is the way I think of since the RSS feeds don't have image URL of the post/blog item in it. Though I keep on seeing



<img src="http://ift.tt/1oFDtqD" />


but it's only 1px image. Does this one has relevant value to the feed item or can I convert this to maybe the actual image? Here's the RSS http://ift.tt/1mWCPzw


Anyway, here's my attempt to extract the image using the url in the feeds:



function extact_first_image( $url ) {
$content = file_get_contents($url);

// Narrow the html to get the main div with the blog content only.
// source: http://ift.tt/1oFDqeu
$PreMain = explode('<div id="main-content"', $content);
$main = explode("</div>" , $PreMain[1] );

// Regex that finds matches with img tags.
$output = preg_match_all('/<img[^>]+src=[\'"]([^\'"]+)[\'"][^>]*>/i', $main[12], $matches);

// Return the img in html format.
return $matches[0][0];
}

$url = 'http://ift.tt/1mWCQ6I'; //Sample URL from the feed.
echo extact_first_image($url);


Obvious downside of this function: It properly explodes if <div id="main-content" is found in the html. When there's another xml to parse with another structure, there will be another explode for that as well. It's very much static.


I guess its worth mentioning also is regarding the load time. When I perform loop through out the items in the feed, its even more longer.


I hope I made clear of the points. Feel free to drop in any ideas that could help optimize the solution perhaps.


No comments:

Post a Comment