使用cdata解析xml feed PHP SimpleXML [复制]

This question already has an answer here:

How to parse CDATA HTML-content of XML using SimpleXML? 2 answers

I am parsing a rss feed to json using php.

using below code

my json output contains data out of description from item element but title and link data not extracting

problem is some where with incorrent CDATA or my code is not parsing it correctly.

xml is here

$blog_url = 'http://www.blogdogarotinho.com/rssfeedgenerator.ashx';

$rawFeed = file_get_contents($blog_url);
$xml=simplexml_load_string($rawFeed,'SimpleXMLElement', LIBXML_NOCDATA);

// step 2: extract the channel metadata
$articles = array();    

// step 3: extract the articles

foreach ($xml->channel->item as $item) {
    $article = array();

    $article['title'] = (string)trim($item->title);
    $article['link'] = $item->link;      
    $article['pubDate'] = $item->pubDate;
    $article['timestamp'] = strtotime($item->pubDate);
    $article['description'] = (string)trim($item->description);
    $article['isPermaLink'] = $item->guid['isPermaLink'];        

    $articles[$article['timestamp']] = $article;
}

echo json_encode($articles);

</div>

写回答
好问题 0 提建议
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dplsnw7329 2014-06-01 18:28
关注
I think you are just the victim of the browser hiding the tags. Let me explain: Your input feed doesn't really has <![CDATA[ ]]> tags in them, the < and >s are actually entity encoded in the raw source of the rss stream, hit <kbd>ctrl</kbd>+<kbd>u</kbd> on the rss link in your browser and you will see:

<?xml version="1.0" encoding="utf-16"?> <rss xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" version="2.0"> <channel> <description>Blog do Garotinho</description> <item> <description><![CDATA[<br> Fico impressionado com a hipocrisia e a falsidade de certos políticos....]]> </description> <link><![CDATA[http://www.blogdogarotinho.com.br/lartigo.aspx?id=16796]]></link> ... <title><![CDATA[A bancada dos caras de pau]]></title> </item>

As you can see the <title> for example starts with a < which when will turn to a < when simplexml returns it for your json data. Now if you are looking the printed json data in a browser your browser will see the following:

"title":"<![CDATA[A bancada dos caras de pau]]>"

Which will will not be rendered because it's inside a tag. The description seem to show up because it has a <br> tag in it at some point which ends the first "tag" and thus you can see the rest of the output.

If you hit <kbd>ctrl</kbd>+<kbd>u</kbd> you should see the output printed as expected (i myself used a command line php file and did not notice this first).

Try this demo:

There seem to be empty an empty "" after the "title":
http://codepad.viper-7.com/ZYpaS1

However if i put a htmlspecialchars() around the json_encode():
http://codepad.viper-7.com/1nHqym they became "visible".

You could try to get rid of these by simply replacing them out after the parse with a simple preg_replace():

function clean_cdata($str) { return preg_replace('#(^\s*<!\[CDATA\[|\]\]>\s*$)#sim', '', (string)$str); }

This should take care of the CDATA blocks if they are at the start or the end of the individual tags. You can throw call this inside the foreach() loop like this:

// .... $article['title'] = clean_cdata($item->title); // ....
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

php xml 中文,php-SimpleXML和中文
2021-04-16 14:29

程卷卷卷的博客我正忙于尝试处理以下RSS feed：Yahoo Search RSS,一旦获取数据,就使用以下代码：$response = simplexml_load_string($data);但是,当我询问简单的xml对象时,99％的中文字符和字符串会消失.我尝试通过执行以下操作将...
xml 有html标签 php,在PHP中解析XML内部的HTML标签
2021-07-02 00:13

江泓的博客正如How to parse CDATA HTML-content of XML using SimpleXML?...accepted answer to the linked question已经显示了这一点非常详细，对于SimpleXML，它在这里不起什么作用，无论这个RSS提要是使用CDATA...
php simplexml_PHP的SimpleXML处理
2020-06-20 10:26

cuyi7076的博客 PHP版本5引入了SimpleXML，SimpleXML是一种用于读写XML的新应用程序编程接口（API）。在SimpleXML中，表达式如下： $doc->rss->channel->item->title 从文档中选择元素。只要您对文档的结构有所了解...
java解析远程xml_远程XML文件写得不好导致解析错误
2021-03-03 11:33

Wang Namelos的博客他们没有费心将描述包装到CDATA中，而是使用simplexml_load_file解析错误 .这是我写的解析它的函数：function displayAll($url) {$url = "https://www.game.com/newsfeed/rss.vm";$game = simplexml_...
php读取xml出现错误：“parser error : CData section not finished”问题
2019-10-04 06:38

weixin_30596343的博客原因是该XML上有一些无效字符。试试下面的代码 $url = 'http://noticias.perfil.com/feed/'; $html = file_get_contents($url); $invalid_characters = '/[^\x9\xa\x20-\xD7FF\xE000-\xFFFD]/'; $...
探索使用 PHP 进行实际的数据挖掘和解析
2015-12-29 15:38

JF_2012的博客探索使用 PHP 进行实际的数据挖掘和解析使用 PHP 挖掘 XML 和 HTML 数据，以发现有用的信息数据挖掘是一个广泛的领域，不同的开发人员对数据挖掘的理解可能完全不同。在本文中，您将了解什么是数据...
PHP 解决xml文件 xml_parse — 开始解析一个 XML 文档
2016-07-21 12:06

涵宇菲子的博客 http://cn2.php.net/manual/zh/function.xml-parse.php xml_parse ...xml_parse — 开始解析一个 XML 文档 Report a bug 说明 int xml_parse ( resource $parser , string $data [, bool $is_f
PHP 中的 SimpleXML 处理
2012-05-29 17:41

kindy1022的博客 PHP 版本 5 引入了 SimpleXML，一种用于读写 XML 的新的应用程序编程接口（API）。在 SimpleXML 中，下面的这样的表达式： $doc->rss->channel->item->title 从文档中选择元素。只要熟悉文档...
php读取xml
2014-04-02 11:09

sugang_ximi的博客 <?php $xml_url = 'http://wangfali.com/feed'; // $xml_url = 'xml.xml'; // simplexml_load_file... // $xml_arr = simplexml_load_file(rawurlencode($xml_url), 'SimpleXMLElement', LIBXML_NOCDATA);// CD
PHP 挖掘 XML 和 HTML 数据
2013-12-22 11:23

「已注销」的博客 CDATA: 字符数据 DOM: 文档对象模式 FTP: 文件传输协议 HTML: 超文本标记语言 HTTP: 超文本传输协议 REST: 具象状态传输 URL: 统一资源定位符 W3C: 万维网联盟 XML: 可扩展标记语言 Wikipedia 对 ...
[转]PHP 中的 SimpleXML 处理
2009-07-24 10:49

weixin_30680385的博客了解和 PHP 版本 5 捆绑到一起的 SimpleXML 扩展，它使 PHP 页面能够以 PHP 友好的语法来查询、搜索、修改和重新发布 XML。 PHP 版本 5 引入了 SimpleXML，一种用于读写 XML 的新的应用程序编程接口（API）。在 ...
QueryPath, php上的jQuery
2010-05-23 03:24

iteye_5904的博客红得发紫的jQuery框架是专门用于页面...受到jQuery的启发，一种试图让Web开发者在PHP中直接采用jQuery方式操纵和生成HTML/XML元素的 QueryPath计划开始了，库的发开者是Matt Butcher 。 QueryPath...
没有解决我的问题, 去提问

使用cdata解析xml feed PHP SimpleXML [复制]

1条回答 默认 最新

1条回答默认最新