doutale7115 2014-09-05 08:02
浏览 56
已采纳

PHP简单的HTML DOM解析器

I am working with simple web crawler. Below is simple html code i used to learn.

input.php

<ul id="nav">
    <li>
        <a href="www.google.com">Google</a>
        <ul>
            <li>
                <a href="mail.gmail.com">Gmail</a>
            </li>
        </ul>
    </li>
    <li>
        <a href="www.yahoo.com">Yahoo</a>
        <ul>
            <li>
                <a href="mail.yahoo.com">Yahoo Mail</a>
            </li>
        </ul>
    </li>
</ul>

I need to crawl the first anchor tag in ul[id=nav]->li. The code i used to crawl input.php is

<?php
    include 'simple_html_dom.php';
    $html = file_get_html('input.php');

    foreach ($html->find('ul[id=nav]') as $navUL){
        foreach ($navUL->find('li') as $navUL_LI){
            echo $navUL_LI->find('a',0)->outertext."<br>";              
        }
    }
?>

It Displays all the anchor tag in my input.php. I need to display only google and yahoo. How can i achieve this?

  • 写回答

6条回答 默认 最新

  • dongzhi5587 2014-09-05 08:14
    关注

    In this case you can directly point it out with children() method. Example:

    foreach($html->find('ul#nav') as $ul) {
        foreach($ul->children() as $li) {
            echo $li->children(0)->outertext . '<br/>';
        }
    }
    

    Alternatively, you can use DOMDocument + DOMXpath for this too:

    $dom = new DOMDocument();
    $dom->loadHTML($str);
    $xpath = new DOMXpath($dom);
    // directly target those links
    $links = $xpath->query('//ul[@id="nav"]/li/a');
    
    foreach($links as $a) {
        echo $a->nodeValue . '<br/>';
    }
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(5条)

报告相同问题?

悬赏问题

  • ¥15 如何在scanpy上做差异基因和通路富集?
  • ¥20 关于#硬件工程#的问题,请各位专家解答!
  • ¥15 关于#matlab#的问题:期望的系统闭环传递函数为G(s)=wn^2/s^2+2¢wn+wn^2阻尼系数¢=0.707,使系统具有较小的超调量
  • ¥15 FLUENT如何实现在堆积颗粒的上表面加载高斯热源
  • ¥30 截图中的mathematics程序转换成matlab
  • ¥15 动力学代码报错,维度不匹配
  • ¥15 Power query添加列问题
  • ¥50 Kubernetes&Fission&Eleasticsearch
  • ¥15 報錯:Person is not mapped,如何解決?
  • ¥15 c++头文件不能识别CDialog