dsfsda121545 2016-02-12 10:16
浏览 105
已采纳

为什么XPath查询不起作用?

I want to pick the title and youtube link from the following xml:

`<?xml version="1.0" encoding="UTF-8"?><feed     xmlns="http://www.w3.org/2005/Atom"><category term="videos" label="/r/videos"/>    <icon>https://www.redditstatic.com/icon.png/</icon><id>/r/videos/.xml</id><link     rel="self" href="https://www.reddit.com/r/videos/.xml"     type="application/atom+xml" /><link rel="alternate" href="https://www.reddit.com/r/videos/" type="text/html" /><logo>https://a.thumbs.redditmedia.com/mtwnduVr0DnrK1o8rpTPi6waLWuPimj_8ntK8i5t890.png</logo><subtitle>A great place for video content of all kinds.</subtitle><title>Videos</title><entry><author><name>/u/LegendaryContent</name><uri>https://www.reddit.com/user/LegendaryContent</uri></author><category term="videos" label="/r/videos"/><content type="html">&lt;table&gt; &lt;tr&gt;&lt;td&gt; &lt;a href=&quot;https://www.reddit.com/r/videos/comments/45crp7/1400_employees_being_laid_off/&quot;&gt; &lt;img src=&quot;https://b.thumbs.redditmedia.com/UR4XFRqoMtj5watvSUrUlEdTYiA1gOv_OxqxtxNyftQ.jpg&quot; alt=&quot;1,400 Employees being laid off&quot; title=&quot;1,400 Employees being laid off&quot; /&gt; &lt;/a&gt; &lt;/td&gt;&lt;td&gt; &amp;#32; submitted by &amp;#32; &lt;a href=&quot;https://www.reddit.com/user/LegendaryContent&quot;&gt; /u/LegendaryContent &lt;/a&gt; &lt;br/&gt; &lt;span&gt;&lt;a href=&quot;https://youtu.be/Y3ttxGMQOrY&quot;&gt;[link]&lt;/a&gt;&lt;/span&gt; &amp;#32; &lt;span&gt;&lt;a href=&quot;https://www.reddit.com/r/videos/comments/45crp7/1400_employees_being_laid_off/&quot;&gt;[comments]&lt;/a&gt;&lt;/span&gt; &lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;</content><id>t3_45crp7</id><link href="https://www.reddit.com/r/videos/comments/45crp7/1400_employees_being_laid_off/" /><updated>2016-02-12T03:22:38+00:00</updated><title>1,400 Employees being laid off</title></entry></feed>`

My code is here:

<?php
$videos ="";
$video_category = "Trending Videos";
$url = "https://www.reddit.com/r/videos/.xml";
$feed_dom = new domDocument; 
$feed_dom->load($url); 
$feed_dom->preserveWhiteSpace = false;
$items = $feed_dom->getElementsByTagName('entry');

foreach($items as $item){
$title = $item->getElementsByTagName('title')->item(0)->nodeValue;
$desc_table = $item->getElementsByTagName('content')->item(0)->nodeValue;

$table_dom = new domDocument;
$table_dom->loadHTML($desc_table);
$xpath = new DOMXpath($table_dom);
$table_dom->preserveWhiteSpace = false;
$yt_link_node = $xpath->query("//table/tr/td[2]/a[2]");

foreach($yt_link_node as $yt_link){

$yt = $yt_link->getAttribute('href');
echo $title;
echo $yt;
}
?>

For some reason, it isn't working and I have applied almost every xpath query that I found on google & stackoverflow. Title is echoing well, but not the $yt. Can you pick what wrong I am doing?

展开全部

  • 写回答

1条回答 默认 最新

  • donglizuo8892 2016-02-12 10:44
    关注

    It's because the DOM is slightly different from what you seem to expect.

    The HTML you are parsing there ($desc_table) typically has this structure:

    <table>
        <tr>
            <td>
                <a href="https://www.reddit.com/r/videos/comments/...">
                    <img src="https://b.thumbs.redditmedia.com/....jpg" 
                         alt="..." title="..." />
                </a>
            </td>
            <td> &#32; submitted by &#32; 
                <a href="https://www.reddit.com/user/..."> /u/... </a> 
                <br/> 
                <span>
                    <a href="https://youtu.be/...">[link]</a>
                </span>
                &#32;
                <span>
                    <a href="https://www.reddit.com/r/videos/comments/.../">[comments]</a>
                </span> 
            </td>
        </tr>
    </table>
    

    So there is no second anchor element (a) that is a direct child of the second td element, as the second (and third) anchor is wrapped in a span tag.

    So if you want to get to this link:

                    <a href="https://youtu.be/...">[link]</a>
    

    then use this XPath instead:

     $yt_link_node = $xpath->query("//table/tr/td[2]/span[1]/a");
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
编辑
预览

报告相同问题?

悬赏问题

  • ¥20 求一个简易射频信号综测仪
  • ¥15 esp8266 tally灯 接收端改为发射端
  • ¥30 Labview代码调用access 数据库,相同代码其中一个调用不出来是为什么
  • ¥15 基于51单片机的交通灯系统,找改程序有点急
  • ¥15 java启动jar包后,运行过程中宕机
  • ¥15 进行LM运算过程中出现了无法识别的问题,具体问题如下图
  • ¥500 高有偿提问!求优化设计微信小程序
  • ¥15 matlab在安装时报错 无法找到入口 无法定位程序输入点
  • ¥15 Android Studio webview 的使用问题, 播放器横屏全屏
  • ¥15 删掉jdk后重新下载,Java web所需要的eclipse无法使用
手机看
程序员都在用的中文IT技术交流社区

程序员都在用的中文IT技术交流社区

专业的中文 IT 技术社区,与千万技术人共成长

专业的中文 IT 技术社区,与千万技术人共成长

关注【CSDN】视频号,行业资讯、技术分享精彩不断,直播好礼送不停!

关注【CSDN】视频号,行业资讯、技术分享精彩不断,直播好礼送不停!

客服 返回
顶部