duanmao1319 2013-03-15 18:19
浏览 30
已采纳

使用PHP遍历Yandex API的XML响应

I am creating a metasearch engine using Yandex API. Yandex gives result in XML format. So we need to traverse the XML response inorder to get the different fields like URL,title ,description etc.

The XML response by Yandex is as follows: http://pastebin.com/kAVAVri9

This is how i have implemented: paste

$dom5 = new DOMDocument();

if ($dom5->loadXML($site_results)) {

    $results  = $dom5->getElementsByTagName("response");
    $results1 = $results->getElementsByTagName("results");
    $results2 = $results1->getElementsByTagName("group");


    $totals["yandex"] = 1000;


    foreach ($results1 as $link) {

        $url = $link->getElementsByTagName("doc")->item(2)->nodeValue;
        ;
        $url = str_replace('http://', '', $url);
        if (substr($url, -1, 1) == '/') {
            $url = substr($url, 0, strlen($url) - 1);
        }
        $search_results[$i]["url"] = $url;

        $title                       = $link->getElementsByTagName("doc")->item(4)->nodeValue;
        $search_results[$i]["title"] = $title;
        $test                        = $link->getElementsByTagName("doc");
        $test1                       = $test->getElementsByTagName("title");
        $desc                        = $test1->getElementsByTagName("headline")->item(0)->nodeValue;
        $search_results[$i]["desc"]  = $desc;

        $search_results[$i]["engine"]   = 'yandex';
        $search_results[$i]["position"] = $i + 1;
        $i++;

    }
}

I am new to php. Please forgive me if i have done some stupid mistake. I am unable to retrive the results through my implementation. Please help me find the mistake and get the necessary fields from xml response. Thank you!

  • 写回答

1条回答 默认 最新

  • dtbc37573 2013-03-16 00:42
    关注

    The method getElementsByTagName() returns a DOMNodeList:

    $results  = $dom5->getElementsByTagName("response");
    

    The DOMNodeList does not have a method called getElementsByTagName(), but you call it:

    $results1 = $results->getElementsByTagName("results");
    

    Therefore the fatal error is triggered: Whenever in PHP you execute a method on an object that does not exist, you will get a fatal error and your script stops working.

    Do not call undefined object methods and you should be fine.

    Apart from these basics, for parsing such XML documents I normally suggest SimpleXML, however this XML file is a little specific therfore I suggest to extend from SimpleXML and add the features you likely need to use, in part from regular expressions as well as from DOMDocument.

    One concept you should know about when parsing these XML files is Xpath. For example to access the elements you had that many problems with above, you can write the path literally:

    /*/response/results/grouping/group
    

    In PHP with SimpleXML this looks like:

    $url = 'http://pastebin.com/raw.php?i=kAVAVri9';
    $xml = simplexml_load_file($url, 'MySimpleXML');
    foreach ($xml->xpath('/*/response/results/grouping/group') as $link) {
        # ... operate on $link
    }
    

    A larger example:

    $url = 'http://pastebin.com/raw.php?i=kAVAVri9';
    $url = '../data/yandex.xml';
    $xml = simplexml_load_file($url, 'MySimpleXML');
    foreach ($xml->xpath('/*/response/results/grouping/group') as $link) {
        $url      = $link->doc->url->str()->preg('~^https?://(.*?)/*$~u', '$1');
        $title    = $link->doc->title->text();
        $headline = $link->doc->headline->text();
        printf("<%s> %s
    %s
    
    ", $url, $title, wordwrap($headline));
    }
    

    And it's exemplary output:

    <www.facebook.com> " Facebook" - a social networking service
    Allows users to find and communicate with friends, classmates and
    colleagues, share thoughts, photos and videos, and join various groups.
    
    <en.wikipedia.org/wiki/Facebook>  Facebook - Wikipedia, the free encyclopedia
     Facebook is a social networking service launched in February 2004, owned
    and operated by Facebook, Inc. As of September 2012, Facebook has over one
    billion active users, more than half of them using Facebook on a mobile
    device.
    
    <mashable.com/category/facebook>  Facebook 
    
    ...
    

    The PHP code example above needs some more code to work because it extends from SimpleXML for the ease of use. This is done with the following code:

    class MySimpleXML extends SimpleXMLElement
    {
        public function text()
        {
            $string = null === $this[0] ? ''
                : (dom_import_simplexml($this)->textContent);
    
            return $this->str($string)->normlaizeWS();
        }
    
        public function str($string = null)
        {
            return new MyString($string ?: $this);
        }
    }
    
    class MyString
    {
        private $string;
    
        public function __construct($string)
        {
            $this->string = $string;
        }
    
        public function preg($pattern, $replacement)
        {
            return new self(preg_replace($pattern, $replacement, $this));
        }
    
        public function normlaizeWS()
        {
            return $this->preg('~\s+~', ' ');
        }
    
        public function __toString()
        {
            return (string) $this->string;
        }
    }
    

    This might be all a little bit much for the beginning, checkout the PHP manual for SimpleXML and the other functions used in the code-example.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 乘性高斯噪声在深度学习网络中的应用
  • ¥15 运筹学排序问题中的在线排序
  • ¥15 关于docker部署flink集成hadoop的yarn,请教个问题 flink启动yarn-session.sh连不上hadoop,这个整了好几天一直不行,求帮忙看一下怎么解决
  • ¥30 求一段fortran代码用IVF编译运行的结果
  • ¥15 深度学习根据CNN网络模型,搭建BP模型并训练MNIST数据集
  • ¥15 C++ 头文件/宏冲突问题解决
  • ¥15 用comsol模拟大气湍流通过底部加热(温度不同)的腔体
  • ¥50 安卓adb backup备份子用户应用数据失败
  • ¥20 有人能用聚类分析帮我分析一下文本内容嘛
  • ¥30 python代码,帮调试,帮帮忙吧