douzhi3105 2013-11-28 08:56
浏览 242
已采纳

正则表达式。 在特定标签之间查找标签

There are an html code that contains many hrefs. But I don't need all of hrefs. I want to get only hrefs contained in the div:

<div class="category-map second-links"> 
*****
</div> <p class="sec">

what i want to see as a result:

<a href='xxx'>yyy</a>
<a href='zzz'>www</a>
...

My version (not working):

(?<=<div class=\"category-map second-links\">)(.+?(<a href=\".+?".+?>.+<\/a>))+(?=<\/div> <p class="sec">)
  • 写回答

4条回答 默认 最新

  • drne47241 2013-11-28 15:56
    关注

    If you load your HTML into a DOM document, you can use Xpath to query nodes from it.

    All a elements inside the document:

    • //a

    That have an ancestor/a parent div element:

    • //a[ancestor:div]

    With the class attribute category-map second-links

    • //a[ancestor::div[@class = "category-map second-links"]]

    Get the href attributes of the filtered a elements (Optionally)

    • //a[ancestor::div[@class = "category-map second-links"]]/@href

    Full Example:

    $html = <<<'HTML'
    <div class="category-map second-links"> 
    *****
        <!--<div class="category-map second-links"> Comment hacks --> 
        <div class="category-map second-links">
            <a href='xxx'>yyy</a>
            <a href='zzz'>www</a>
    ...
        </div>
    <div class="category-map second-links"> 
    *****
        <!--<div class="category-map second-links"> Comment hacks --> 
        <div class="category-map second-links">
            <a href='aaa'>bbb</a>
            <a href='ccc'>ddd</a>
    ...
        </div>
    </div> <p class="sec">
    HTML;
    
    $dom = new DOMDocument();
    $dom->loadHtml($html);
    $xpath = new DOMXpath($dom);
    
    // fetch the href attributes
    $hrefs = array();
    foreach ($xpath->evaluate('//a[ancestor::div[@class = "category-map second-links"]]/@href') as $node) {
      $hrefs[] = $node->value;
    }
    var_dump($hrefs);
    
    // fetch the a elements an read some data from them
    $linkData = array();
    foreach ($xpath->evaluate('//a[ancestor::div[@class = "category-map second-links"]]') as $node) {
      $linkData[] = array(
        'href' => $node->getAttribute('@href'),
        'text' => $node->nodeValue,
      );
    }
    var_dump($linkData);
    
    // fetch the a elements and store their html
    $links = array();
    foreach ($xpath->evaluate('//a[ancestor::div[@class = "category-map second-links"]]') as $node) {
      $links[] = $dom->saveHtml($node);
    }
    var_dump($links);
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(3条)

报告相同问题?

悬赏问题

  • ¥15 没有证书,nginx怎么反向代理到只能接受https的公网网站
  • ¥50 成都蓉城足球俱乐部小程序抢票
  • ¥15 yolov7训练自己的数据集
  • ¥15 esp8266与51单片机连接问题(标签-单片机|关键词-串口)(相关搜索:51单片机|单片机|测试代码)
  • ¥15 电力市场出清matlab yalmip kkt 双层优化问题
  • ¥30 ros小车路径规划实现不了,如何解决?(操作系统-ubuntu)
  • ¥20 matlab yalmip kkt 双层优化问题
  • ¥15 如何在3D高斯飞溅的渲染的场景中获得一个可控的旋转物体
  • ¥88 实在没有想法,需要个思路
  • ¥15 MATLAB报错输入参数太多