使用cdata解析xml feed PHP SimpleXML [复制]

This question already has an answer here:

How to parse CDATA HTML-content of XML using SimpleXML? 2 answers

I am parsing a rss feed to json using php.

using below code

my json output contains data out of description from item element but title and link data not extracting

problem is some where with incorrent CDATA or my code is not parsing it correctly.

xml is here

$blog_url = 'http://www.blogdogarotinho.com/rssfeedgenerator.ashx';

$rawFeed = file_get_contents($blog_url);
$xml=simplexml_load_string($rawFeed,'SimpleXMLElement', LIBXML_NOCDATA);

// step 2: extract the channel metadata
$articles = array();    

// step 3: extract the articles

foreach ($xml->channel->item as $item) {
    $article = array();

    $article['title'] = (string)trim($item->title);
    $article['link'] = $item->link;      
    $article['pubDate'] = $item->pubDate;
    $article['timestamp'] = strtotime($item->pubDate);
    $article['description'] = (string)trim($item->description);
    $article['isPermaLink'] = $item->guid['isPermaLink'];        

    $articles[$article['timestamp']] = $article;
}

echo json_encode($articles);

</div>

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dplsnw7329 2014-06-01 18:28
关注
I think you are just the victim of the browser hiding the tags. Let me explain: Your input feed doesn't really has <![CDATA[ ]]> tags in them, the < and >s are actually entity encoded in the raw source of the rss stream, hit <kbd>ctrl</kbd>+<kbd>u</kbd> on the rss link in your browser and you will see:

<?xml version="1.0" encoding="utf-16"?> <rss xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" version="2.0"> <channel> <description>Blog do Garotinho</description> <item> <description><![CDATA[<br> Fico impressionado com a hipocrisia e a falsidade de certos políticos....]]> </description> <link><![CDATA[http://www.blogdogarotinho.com.br/lartigo.aspx?id=16796]]></link> ... <title><![CDATA[A bancada dos caras de pau]]></title> </item>

As you can see the <title> for example starts with a < which when will turn to a < when simplexml returns it for your json data. Now if you are looking the printed json data in a browser your browser will see the following:

"title":"<![CDATA[A bancada dos caras de pau]]>"

Which will will not be rendered because it's inside a tag. The description seem to show up because it has a <br> tag in it at some point which ends the first "tag" and thus you can see the rest of the output.

If you hit <kbd>ctrl</kbd>+<kbd>u</kbd> you should see the output printed as expected (i myself used a command line php file and did not notice this first).

Try this demo:

There seem to be empty an empty "" after the "title":
http://codepad.viper-7.com/ZYpaS1

However if i put a htmlspecialchars() around the json_encode():
http://codepad.viper-7.com/1nHqym they became "visible".

You could try to get rid of these by simply replacing them out after the parse with a simple preg_replace():

function clean_cdata($str) { return preg_replace('#(^\s*<!\[CDATA\[|\]\]>\s*$)#sim', '', (string)$str); }

This should take care of the CDATA blocks if they are at the start or the end of the individual tags. You can throw call this inside the foreach() loop like this:

// .... $article['title'] = clean_cdata($item->title); // ....
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

使用cdata解析xml feed PHP SimpleXML [复制] json php xml
2014-06-01 15:12

回答 1 已采纳 I think you are just the victim of the browser hiding the tags. Let me explain: Your input feed do
使用Go使用CDATA解析XML xml
2018-03-27 09:12

回答 1 已采纳 The first thing you should do is not ignore any errors that xml.Unmarshal can give you: if err :=
PHP XML CDATA解析 php xml
2013-10-02 12:36

回答 2 已采纳 OK, I solved the issue but using a regex (thanks to this page where the issue was discussed, too):
php xml 中文,php-SimpleXML和中文
2021-04-16 14:29

程卷卷卷的博客我正忙于尝试处理以下RSS feed：Yahoo Search RSS,一旦获取数据,就使用以下代码：$response = simplexml_load_string($data);但是,当我询问简单的xml对象时,99％的中文字符和字符串会消失.我尝试通过执行以下操作将...
PHP - 使用simplexml_load_string解析XML - 使用CDATA获取空值？ [重复] php
2014-12-19 18:34

回答 1 已采纳 The problem seems to be that the cast you're doing to array can return results different than the
使用php解析XML到Array json php xml
2015-12-03 03:08

回答 1 已采纳 I solved this parsing problem with foreach(). $xml = simplexml_load_string($input,null,LIBXML_NOC
使用SimpleXMLElement使用CDATA编辑XML php
2014-09-15 17:03

回答 1 已采纳 The XML written out contains no CDATA nodes, because you told SimpleXML to get rid of them when yo
xml 有html标签 php,在PHP中解析XML内部的HTML标签
2021-07-02 00:13

江泓的博客正如How to parse CDATA HTML-content of XML using SimpleXML?...accepted answer to the linked question已经显示了这一点非常详细，对于SimpleXML，它在这里不起什么作用，无论这个RSS提要是使用CDATA...
使用XMLWriter将变量数据输出为CDATA XML php sql xml
2014-10-09 10:26

回答 3 已采纳 Do not use XMLWriter::writeRaw(), except if you really want to write XML fragments directly. "Raw"
使用PHP解析XML CDATA [关闭] php xml
2009-08-07 20:02

回答 1 已采纳 Just out of curiosity, after getting your XML (I hope I didnt't destroy it in the process -- I'll
解析CURL XML响应PHP php xml
2016-07-31 08:29

回答 1 已采纳 You can try this $string = '<result> <contact id="90676"> <Group_Tag name="Sequenc
php simplexml_PHP的SimpleXML处理
2020-06-20 10:26

cuyi7076的博客 PHP版本5引入了SimpleXML，SimpleXML是一种用于读写XML的新应用程序编程接口（API）。在SimpleXML中，表达式如下： $doc->rss->channel->item->title 从文档中选择元素。只要您对文档的结构有所了解...
使用CDATA时，它不显示所有内容并添加结束方括号 php xml
2018-04-23 14:22

回答 1 已采纳 CDATA sections are a special kind of character data node without decoding. It is not the same as a
java解析远程xml_远程XML文件写得不好导致解析错误
2021-03-03 11:33

Wang Namelos的博客他们没有费心将描述包装到CDATA中，而是使用simplexml_load_file解析错误 .这是我写的解析它的函数：function displayAll($url) {$url = "https://www.game.com/newsfeed/rss.vm";$game = simplexml_...
探索使用 PHP 进行实际的数据挖掘和解析
2015-12-29 15:38

JF_2012的博客探索使用 PHP 进行实际的数据挖掘和解析使用 PHP 挖掘 XML 和 HTML 数据，以发现有用的信息数据挖掘是一个广泛的领域，不同的开发人员对数据挖掘的理解可能完全不同。在本文中，您将了解什么是数据...
php读取xml出现错误：“parser error : CData section not finished”问题
2019-10-04 06:38

weixin_30596343的博客原因是该XML上有一些无效字符。试试下面的代码 $url = 'http://noticias.perfil.com/feed/'; $html = file_get_contents($url); $invalid_characters = '/[^\x9\xa\x20-\xD7FF\xE000-\xFFFD]/'; $...
PHP 解决xml文件 xml_parse — 开始解析一个 XML 文档
2016-07-21 12:06

涵宇菲子的博客 http://cn2.php.net/manual/zh/function.xml-parse.php xml_parse ...xml_parse — 开始解析一个 XML 文档 Report a bug 说明 int xml_parse ( resource $parser , string $data [, bool $is_f
PHP 中的 SimpleXML 处理
2012-05-29 17:41

kindy1022的博客 PHP 版本 5 引入了 SimpleXML，一种用于读写 XML 的新的应用程序编程接口（API）。在 SimpleXML 中，下面的这样的表达式： $doc->rss->channel->item->title 从文档中选择元素。只要熟悉文档...
php读取xml
2014-04-02 11:09

sugang_ximi的博客 <?php $xml_url = 'http://wangfali.com/feed'; // $xml_url = 'xml.xml'; // simplexml_load_file... // $xml_arr = simplexml_load_file(rawurlencode($xml_url), 'SimpleXMLElement', LIBXML_NOCDATA);// CD
没有解决我的问题, 去提问

悬赏问题

¥15 Python中的request，如何使用ssr节点，通过代理requests网页。本人在泰国，需要用大陆ip才能玩网页游戏，合法合规。
¥100 为什么这个恒流源电路不能恒流？
¥15 有偿求跨组件数据流路径图
¥15 写一个方法checkPerson，入参实体类Person，出参布尔值
¥15 我想咨询一下路面纹理三维点云数据处理的一些问题，上传的坐标文件里是怎么对无序点进行编号的，以及xy坐标在处理的时候是进行整体模型分片处理的吗
¥15 CSAPPattacklab
¥15 一直显示正在等待HID—ISP
¥15 Python turtle 画图
¥15 stm32开发clion时遇到的编译问题
¥15 lna设计源简并电感型共源放大器

使用cdata解析xml feed PHP SimpleXML [复制]

1条回答 默认 最新

悬赏问题

1条回答默认最新