douou6696 2011-03-25 15:54
浏览 52
已采纳

这个xPath有点帮助吗?

I am getting some info from an RSS.

<?php
$dom = new DOMDocument;
libxml_use_internal_errors(TRUE);
$dom->load('http://www.myrss.com');
libxml_clear_errors();

$xPath = new DOMXPath($dom);
$links = $xPath->query('xxxxx');
foreach($links as $link) {
    printf("%s 
", $link->nodeValue);
}
?>

I have managed to get the TITLE, LINK and DESCRIPTION with //item/title and so on, howver I want to get the text content and image of description seperated.

As I can see through page source using firefox this is the code I see for image and the content. Both are in <description></description>

IMAGE

<div class="separator" style="clear: both; text-align: center;"><a href="LINK TO IMAGE" imageanchor="1" 
style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="192" 
src="LINK TO IMAGE" width="320" /></a></div>

CONTENT TEXT

<span class="Apple-style-span" style="font-family: 'Trebuchet MS', sans-serif;"> CONTENT TEXT IS HERE </span>

What xPath should I use to get those data? Thank you

  • 写回答

3条回答 默认 最新

  • dongquanyu5816 2011-03-25 17:15
    关注

    If it is what it looks like and the content is HTML-encoded, you can't do it in one step. You must retrieve every description text and parse into its own DOM (unless you want to resort to regex, which I would strongly discourage).

    When in doubt, you can pass it through Tidy before. DOMDocument has loadHTML(), which is pretty resilient, but it is not guaranteed that it can load any HTML.

    // beware, this is untested. it should give you an idea, though.
    
    $dom = new DOMDocument;
    libxml_use_internal_errors(TRUE);
    
    $dom->load('http://www.myrss.com');
    libxml_clear_errors();
    
    $xPath = new DOMXPath($dom);
    $items = $xPath->query('/rss/channel/item');
    
    foreach($items as $item) {
        $descr = $xPath->query('./description', $item);
        // there should be at most one, but foreach gracefully
        // handles the case where there is no <description>
        foreach ($descr as $d) {
            $temp_dom = new DOMDocument();
            $temp_dom->loadHTML( $d->nodeValue );   // error handling/Tidy here!
    
            $temp_xpath = new DOMXPath($temp_dom);
    
            $img = $temp_xpath->query('//img');
            $txt = $temp_xpath->query('//span[@class="Apple-style-span"]');
    
            // now do something with $img and $txt
        }
    
    }
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥20 关于#硬件工程#的问题,请各位专家解答!
  • ¥15 关于#matlab#的问题:期望的系统闭环传递函数为G(s)=wn^2/s^2+2¢wn+wn^2阻尼系数¢=0.707,使系统具有较小的超调量
  • ¥15 FLUENT如何实现在堆积颗粒的上表面加载高斯热源
  • ¥30 截图中的mathematics程序转换成matlab
  • ¥15 动力学代码报错,维度不匹配
  • ¥15 Power query添加列问题
  • ¥50 Kubernetes&Fission&Eleasticsearch
  • ¥15 報錯:Person is not mapped,如何解決?
  • ¥15 c++头文件不能识别CDialog
  • ¥15 Excel发现不可读取的内容