dongxianglun5163
2010-10-19 11:52
浏览 50
已采纳

使用PHP解析RSS源

I need to aggregate RSS content from roughly 500 URL's, while I'm trying to get content from those URL's time out/memory exhausted error occurred(I am trying with SimplePie library).

Is there any method/idea to pull out content fast from bulk sources?

How do I get fresh contents every time?

<?php
require_once('include/simplepie.inc');    
$urlList = array('http://site1.com/index.rss',
'http://site1.com/index.rss',
'http://site2.com/index.rss',
'http://site3.com/index.rss',
'http://site500.com/index.rss',
);  
$feed = new SimplePie();  
$feed->set_feed_url($urlList);  
$feed->init();  
$feed->handle_content_type();  
?>

html portion

<?php  
foreach($feed->get_items() as $item):  
?>  
<div class="item">
<h2><a href="<?php echo $item->get_permalink(); ?>"><?php echo $item->get_title(); ?></a></h2>
<p><?php echo $item->get_description(); ?></p>
<p><small>Posted on <?php echo $item->get_date('j F Y | g:i a'); ?></small></p>
</div>
<?php endforeach; ?>

图片转代码服务由CSDN问答提供 功能建议

我需要汇总大约500个网址的RSS内容,而我正在尝试从这些网址的超时中获取内容 发生/内存耗尽错误(我正在尝试使用SimplePie库)。

是否有任何方法/想法可以快速从批量来源中提取内容?

我如何每次获得新鲜内容?

 &lt;?php 
require_once('include / simplepie.inc');  
 $ urlList = array('http://site1.com/index.rss',
'http://site1.com/index.rss',
'http://site2.com/index。  RSS',
'http://site3.com/index.rss',
'http://site500.com/index.rss',
)的;  
 $ feed = new SimplePie();  
 $的馈&GT; set_feed_url($ urlList);  
 $的馈&GT;的init();  
 $的馈&GT; handle_content_type();  
?&gt; 
   
 
 

html部分

 &lt;?php 
foreach($ feed-&gt;  get_items()as $ item):
?&gt;  
&lt; div class =“item”&gt; 
&lt; h2&gt;&lt; a href =“&lt;?php echo $ item-&gt; get_permalink();?&gt;”&gt;&lt;?php echo $ item-  &GT; get_title();  ?&gt;&lt; / a&gt;&lt; / h2&gt; 
&lt; p&gt;&lt;?php echo $ item-&gt; get_description();  ?&gt;&lt; / p&gt; 
&lt; p&gt;&lt; small&gt;张贴于&lt;?php echo $ item-&gt; get_date('j F Y | g:i a');  ?&gt;&lt; / small&gt;&lt; / p&gt; 
&lt; / div&gt; 
&lt;?php endforeach;  ?&gt; 
   
 
  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 邀请回答

4条回答 默认 最新

  • douruoshen1449 2010-10-19 11:58
    已采纳

    I think you're doing it wrong. If you want to parse that many feeds, you cannot do it from a script that will be called via a webserver.

    If you really want to do the polling, you will have to run that script thru say, cron and then 'save' the results to be served by another PHP script (which can be called by the HTTP server).

    However, you will still have to deal with a lot of inherent limitation to polling : 99% of the time, you will have no new content, thus wasting your CPU, bandwidth and the ones of the servers you're polling. You will also have to deal with dead feeds, non-valid ones, rate limiting, etc...

    Implement the PubSubHubbub protocol. It will help for the feeds who have implemented it, so that you just have to wait for the data that will be pushed to you.

    For the other feeds, you can either do the polling yourself, like you did and try to find a way to avoid the individual errors (not valid XML, dead hosts... etc) or really on a service like Superfeedr (I created it).

    点赞 评论
  • doudi5524 2010-10-19 11:54

    Increase your memory_limit = xxM in php.ini or use ini_set("memory_limit","xxM") where xx is the new memory limit.

    点赞 评论
  • doutan1970 2010-10-19 12:17

    My experience with SimplePie is that it isn't very good or robust. Try using simplexml_import_dom() instead.

    点赞 评论
  • drb56625 2010-10-19 12:38

    Is there any method/idea to pull out content fast from bulk sources?

    trying to poll all 500 urls synchronously is going to put a lot of stress on the system. This can be mitigated by running the transfers in parallel (using the curl_multi_* functions - but the version of SimplePie I've got here doesn't use these for multiple transfers). Assuming the volume of requests for the composite feeds merits it, then the best solution would be to run a scheduler to download the feeds to your server when the current content is set to expire (applying a sensible minimum value) then compositing the feed from the stored data. Note that if you take this approach you'll need to implement some clever semaphores or use a DBMS to store the data - PHP's file locking sematics are not very sophisticated.

    点赞 评论