drvonr6573 2013-07-09 14:36
浏览 143
已采纳

在PHP中从XML内部解析HTML标记

I'm trying to create my own RSS feed (learning purposes) using simplexml_load_string while parsing http://uk.news.yahoo.com/rss in PHP. I get stuck at reading the HTML tags inside the <description> tag.

My code so far looks like this:

$feed = file_get_contents('http://uk.news.yahoo.com/rss');
$rss = simplexml_load_string($feed);

//for each element in the feed
foreach ($rss->channel->item as $item) {
    echo '<h3>'. $item->title . '</h3>'; 

        foreach($item->description as $desc){

             //how to read the href from the a tag???

             //this does not work at all
             $tags = $item->xpath('//a');
             foreach ($tags as $tag) {
                 echo $tag['href'];
             }
       }
}

Any ideas how to extract each HTML tag?

Thanks

  • 写回答

3条回答 默认 最新

  • dsbfbz75185 2013-07-09 15:28
    关注

    The description content has its special characters encoded, so it's not treated as nodes within the XML, rather it's just a string. You can decode the special characters, then load the HTML into DOMDocument and do whatever you want to do. For example:

    foreach ($rss->channel->item as $item) {
        echo '<h3>'. $item->title . '</h3>'; 
    
            foreach($item->description as $desc){
    
                $dom = new DOMDocument();
                $dom->loadHTML(htmlspecialchars_decode((string)$desc));
    
                $anchors = $dom->getElementsByTagName('a');
                echo $anchors->item(0)->getAttribute('href');
            }
    }
    

    XPath is also available for use with DOMDocument, see DOMXPath.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

手机看
程序员都在用的中文IT技术交流社区

程序员都在用的中文IT技术交流社区

专业的中文 IT 技术社区,与千万技术人共成长

专业的中文 IT 技术社区,与千万技术人共成长

关注【CSDN】视频号,行业资讯、技术分享精彩不断,直播好礼送不停!

关注【CSDN】视频号,行业资讯、技术分享精彩不断,直播好礼送不停!

客服 返回
顶部