doujiao1949 2013-12-19 11:19
浏览 62
已采纳

如何从html页面获取文本链接? [重复]

This question already has an answer here:

I want to get the links "http://www.w3schools.com/default.asp" & "http://www.google.com" from this webpage.I want the links of <a> tags inside <div class="link">,there are many other <a> tags in this page and I don't want them. How can I retrieve the particular links only? Can anyone help me?

<div class="link">
<a href="http://www.w3schools.com/default.asp">
<h4>W3 Schools</h4>
</a>
</div>
<div class="link">
<a href="http://www.google.com">
<h4>Google</h4>
</a>
</div>
</div>
  • 写回答

3条回答 默认 最新

  • douzhe3516 2013-12-19 11:22
    关注

    Use a DOM Parser such as DOMDocument to achieve this:

    $dom = new DOMDocument;
    $dom->loadHTML($html); // $html is a string containing the HTML
    
    foreach ($dom->getElementsByTagName('a') as $link) {
        echo $link->getAttribute('href').'<br/>';
    }
    

    Output:

    http://www.w3schools.com/default.asp
    http://www.google.com
    

    Demo.


    UPDATE: If you only want the links inside the specific <div>, you can use an XPath expression to find the links inside the div, and then loop through them to get the href attribute:

    $dom = new DOMDocument;
    $dom->loadHTML($html);
    
    $xpath = new DOMXPath($dom);
    $links_inside_div = $xpath->query("//*[contains(@class, 'link')]/a");
    
    foreach ($links_inside_div as $link) {
        echo $link->getAttribute('href').'<br/>';
    }
    

    Demo.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥15 求帮我调试一下freefem代码
  • ¥15 matlab代码解决,怎么运行
  • ¥15 R语言Rstudio突然无法启动
  • ¥15 关于#matlab#的问题:提取2个图像的变量作为另外一个图像像元的移动量,计算新的位置创建新的图像并提取第二个图像的变量到新的图像
  • ¥15 改算法,照着压缩包里边,参考其他代码封装的格式 写到main函数里
  • ¥15 用windows做服务的同志有吗
  • ¥60 求一个简单的网页(标签-安全|关键词-上传)
  • ¥35 lstm时间序列共享单车预测,loss值优化,参数优化算法
  • ¥15 Python中的request,如何使用ssr节点,通过代理requests网页。本人在泰国,需要用大陆ip才能玩网页游戏,合法合规。
  • ¥100 为什么这个恒流源电路不能恒流?