dongshou1856
2016-07-16 22:50
浏览 22
已采纳

如何在<body>中获取<a>标签,但不包括页眉和页脚部分

If I have a webpage like this:

<body>
  <header>
    <a href='http://domain1.com'>link 1 text</a>
  </header>

  <a href='http://domain2.com'>link 2 text</a>

  <footer>
    <a href='http://domain3.com'>link 3 text</a>
  </footer>
</body>

How do I pull the <a> tags out of the <body> but exclude the links from <header> and <footer>?

In the real web page, there will be a lot of <a> tags in the <header> so I'd rather not have to cycle through ALL of them.

I want to pull out the URLs and anchor text from each of the <a> tags that are NOT inside the <header> or <footer> tags.

EDIT: this is how I find links in the header:

$header = $html->find('header',0);
foreach ($header->find('a') as $a){
  do something
}

I would like to do this (note the use of "!")

$foo = $html->find('!header,!footer');
foreach ($foo->find('a') as $a){
  do something
}
  • 写回答
  • 好问题 提建议
  • 追加酬金
  • 关注问题
  • 邀请回答

3条回答 默认 最新

  • duanli9591 2016-07-16 23:07
    最佳回答

    Remove the header and footer from the DOM you are working with before looking for the links.

    <?php
        include("simple_html_dom.php");
        $source = <<<EOD
        <body>
            <header>
                <a href='http://domain1.com'>link 1 text</a>
            </header>
    
            <a href='http://domain2.com'>link 2 text</a>
    
            <a href='http://domain4.com'>link 4 text</a>
    
            <footer>
                <a href='http://domain3.com'>link 3 text</a>
            </footer>
        </body>
    EOD;
    
        $html = str_get_html($source);
        foreach ($html->find('header, footer') as $unwanted) {
            $unwanted->outertext = "";
        }
        $html->load($html->save()); 
        $links = $html->find("a");
        foreach ($links as $link) {
            print $link;
    };
    
    ?>
    
    评论
    解决 无用
    打赏 举报
查看更多回答(2条)

相关推荐 更多相似问题