从RSS源获取图像，没有图像URL

I would just like to to know how other developers manage to properly get/extract the first image in the blog main content of a site from URL in the RSS feed. This is the way I think of since the RSS feeds don't have image URL of the post/blog item in it. Though I keep on seeing

<img src="http://feeds.feedburner.com/~r/CookingLight/EatingSmart/~4/sIG3nePOu-c" />

but it's only 1px image. Does this one has relevant value to the feed item or can I convert this to maybe the actual image? Here's the RSS http://feeds.cookinglight.com/CookingLight/EatingSmart?format=xml

Anyway, here's my attempt to extract the image using the url in the feeds:

function extact_first_image( $url ) {  
  $content = file_get_contents($url);

  // Narrow the html to get the main div with the blog content only.
  // source: http://stackoverflow.com/questions/15643710/php-get-a-div-from-page-x
  $PreMain = explode('<div id="main-content"', $content);
  $main = explode("</div>" , $PreMain[1] );

  // Regex that finds matches with img tags.
  $output = preg_match_all('/<img[^>]+src=[\'"]([^\'"]+)[\'"][^>]*>/i', $main[12], $matches);  

  // Return the img in html format.
  return $matches[0][0];  
}

$url = 'http://www.cookinglight.com/eating-smart/nutrition-101/foods-that-fight-fat'; //Sample URL from the feed.
echo extact_first_image($url);

Obvious downside of this function: It properly explodes if <div id="main-content" is found in the html. When there's another xml to parse with another structure, there will be another explode for that as well. It's very much static.

I guess its worth mentioning also is regarding the load time. When I perform loop through out the items in the feed, its even more longer.

I hope I made clear of the points. Feel free to drop in any ideas that could help optimize the solution perhaps.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dongliao1949 2014-08-18 18:07
关注
The image urls are in the rss file, so you can get them just by parsing the xml. Each <item> element contains a <media:group> element that contains a <media:content> element. The url to the image for that item is in the "url" attribute of the <media:content> element. Here is some basic code (php) for extracting the image urls into an array:

$xml = simplexml_load_file("http://feeds.cookinglight.com/CookingLight/EatingSmart?format=xml"); $imageUrls = array(); foreach($xml->channel->item as $item) { array_push($imageUrls, (string)$item->children('media', true)->group->content->attributes()->url); }

Keep in mind, though, that the media doesn't necessarily have to be an image. It can be a video or an audio recording. There might even be more than one <media:group>. You can check the "type" attribute of the <media:content> element to see what it is.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

从RSS源获取图像，没有图像URL php xml
2014-08-17 06:29

回答 1 已采纳 The image urls are in the rss file, so you can get them just by parsing the xml. Each <item>
如何从PHP中的tumlbr rss源获取第一张图像 php xml
2015-03-04 19:12

回答 3 已采纳 You can get it from the description, which seems to include a HTML image tag for the image, by usi
RSS XML显示itunes：PHP中的图像URL php xml
2016-06-12 17:10

回答 1 已采纳 // Get namespace URI for prefix `itunes` echo $ns = $s->lookupNamespaceURI('itunes'); // Get `i
RSS-Extractor-Images:PHP 脚本从另一个生成 rssfeed 格式的网站中提取图像，例如。 WordPress、Joomla、..
2021-06-26 00:45

使用此 PHP 脚本，您可以提取：图像、标题、帖子、详细信息……您可以在RSS/Feed文件中找到的所有参考资料。您只需要知道 RSS/Feed 文件的 url。设置 global $ text , $ maxchar , $ end ;$ first_img = '' ;$ ...
如何在PHP中使用卷曲方法从亚马逊Rss Feed获取图像路径 php
2014-12-29 09:22

回答 1 已采纳 I do not see an image URL in the channel description. So no channel wide image to start with. &lt
无法在PHP中创建有效的RSS源 php xml
2019-05-22 02:56

回答 2 已采纳 Just as you don't treat XML as a string when parsing it, you don't treat it as as string when you
使用来自rss Feed的php抓取图像 php
2014-01-26 07:43

回答 1 已采纳 You can make use of DomDocument loadHTML to parse the description field and grab the image tags.
php rss抓取,使用PHP提取RSS提要时提取图像数据
2021-03-23 20:16

一筐猪的头发丝的博客我用来提取提要标题的脚本是：function getFeed($feed_url) {$content = file_get_contents($feed_url);$x = new SimpleXmlElement($content);echo "";foreach($x->channel->item as $entry) {echo "" . $...
PHP解析RSS源 php
2014-12-03 03:23

回答 1 已采纳 $pos = 0 while($x = strpos($data, '<media:content url="', $pos) !== FALSE) { $y = strpos($d
Php rss只显示$ description部分中的图像 php
2015-07-07 22:57

回答 1 已采纳 You'd first alter your regex slightly to match the images, then append this to your $html variable
使用PHP从REST API和RSS异步获取数据 objective-c php
2013-06-17 07:17

回答 1 已采纳 Found this great library which allows parallel async http requests: https://github.com/stil/curl-
php中的图像下载函数,php的图片下载函数
2021-03-25 10:03

一方通行kuma的博客的设定allow_url_fopen = On , 例如我现在用的Dreamhost主机就是allow_url_fopen = Off，而在网上有教怎样打开Dreamhost的allow_url_fopen =On, 所以建议使用第二个函数，不过需要到网上下载s...
通过PHP从Google新闻RSS获取实际链接 php
2012-01-15 22:46

回答 2 已采纳 Regex is not necessarily the best approach here. $query = parse_url($google_url, PHP_URL_QUERY);
guardian:RSS阅读器，可从Guardian API读取内容
2021-04-19 07:49

Guardian是一个Web应用程序，可让您以RSS源的形式从获取最新新闻。待办事项清单 :page_with_curl: 通过询问格式为[[section-name]]的URL来获取供稿 RSS Feed应该符合W3C 使用StyleCI的Lint代码库执行脚本时...
php rss2.0,RSS 2.0 规范
2021-05-08 20:08

weixin_39929721的博客什么是RSS？RSS是一种网页内容联合格式(web content sydication format)。它的名字是Really Simple Syndication的缩写。RSS是XML的一种。所有的RSS文档都遵循XML 1.0规范，该规范发布在W3C网站上。在一个RSS文档的...
PHP rss聚合,利用PHP和AJAX创建RSS聚合器
2021-04-15 20:20

枯叶蚊的博客 function sendRequest(){if(checkReadyState(post)){request = createRequestObject();request.onreadystatechange = onResponse;request.open("GET"， post.responseText， true);...}}由于RSS馈送之间的区别，...
wordpress 自定义_如何在WordPress中创建自定义RSS源
2020-09-09 22:02

cumohuo9136的博客 wordpress自定义WordPress comes with built-in default RSS feeds. You can tweak the default feeds by adding custom content to your RSS Feeds, or even adding post thumbnail to your RSS Feeds. The default...
php rss media group,使用php解析媒体:rss提要中的内容
2021-04-13 14:22

MS.TIME的博客我试图用php解析来自rss的media:content,然后用html显示它。...目前我没有任何一行试图从XML中获取图像。$html = "";$url = "url.rss";$xml = simplexml_load_file($url);$namespaces = $xml->getNamespaces...
RSS介绍、RSS 2.0规范说明及php实现RSS订阅
2017-04-21 17:53

孤独的梦1012的博客 require_once('include/config.inc.php'); $ob_time = 180; //文件缓存时间秒 $rss_file = 'rss.html';
[php框架]非官方_Kohana_3_中文译本.zip
2020-05-19 13:49

从哪个框架运行更快,更安全或者更稳定的立场米看,它们并没有真正的区别。两个框架使用最佳的面问对象编程来开发,都是同样有能力支持大型可扩展解决方案的最佳做法。这里有一些问题你不妨问一下自己,以帮助你来确定...
没有解决我的问题, 去提问

悬赏问题

¥500 火焰左右视图、视差（基于双目相机）
¥100 set_link_state
¥15 虚幻5 UE美术毛发渲染
¥15 CVRP 图论物流运输优化
¥15 Tableau online 嵌入ppt失败
¥100 支付宝网页转账系统不识别账号
¥15 基于单片机的靶位控制系统
¥15 真我手机蓝牙传输进度消息被关闭了，怎么打开？(关键词-消息通知)
¥15 装 pytorch 的时候出了好多问题，遇到这种情况怎么处理？
¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本

从RSS源获取图像，没有图像URL

1条回答 默认 最新

悬赏问题

1条回答默认最新