dongshuofu0039 2018-05-31 15:24
浏览 132
已采纳

正则表达式匹配包含锚中特定单词的所有链接?

I am looking for a regular expression in PHP to extract the links a text that contain the specific words (apple, home, car) in the text of anchor.

Important: the formatting of links is not known in advance.

E.g:

<a href="fruit.html">The Apple red</a>
<a href="Construction.html#one">The big Home</a>
<a href="automotive.html?lang=en">Car for rent</a>

Desired result:

fruit.html
Construction.html#one
automotive.html?lang=en

My pattern:

/<a.*?href="(.*)".*?>apple|car|home<\/a>/i

Update: This pattern works

'/<a.+href=["\'](.*)["\'].*>(.*(?:apple|car|home).*)<\/a>/iU'
  • 写回答

1条回答 默认 最新

  • duanjiao8007 2018-06-04 09:43
    关注

    You could make use of DOMDocument and use getElementsByTagName to get the <a> elements.

    Then you might use preg_match and a regex with an alternation with the words you want to find and add word boundaries to make sure the words are not part of a larger match. To account for case insensitivity you could use the /i flag.

    \b(?:apple|big|car)\b

    $data = <<<DATA
    <a href="fruit.html">The Apple red</a>
    <a href="Construction.html#one">The big Home</a>
    <a href="automotive.html?lang=en">Car for rent</a>
    <a href="fruit.html">The Pineapple red</a>
    <a href="Construction.html#one">The biggest Home</a>
    <a href="automotive.html?lang=en">Cars for rent</a>
    DATA;
    
    $dom = new DOMDocument();
    $dom->loadHTML($data);
    
    foreach($dom->getElementsByTagName("a") as $element) {
        if (preg_match('#\b(?:apple|big|car)\b#i', $element->nodeValue)) {
            echo $element->getAttribute("href") . "<br>";
        }
    }
    

    Demo

    That would give you:

    fruit.html
    Construction.html#one
    automotive.html?lang=en
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 R语言Rstudio突然无法启动
  • ¥15 关于#matlab#的问题:提取2个图像的变量作为另外一个图像像元的移动量,计算新的位置创建新的图像并提取第二个图像的变量到新的图像
  • ¥15 改算法,照着压缩包里边,参考其他代码封装的格式 写到main函数里
  • ¥15 用windows做服务的同志有吗
  • ¥60 求一个简单的网页(标签-安全|关键词-上传)
  • ¥35 lstm时间序列共享单车预测,loss值优化,参数优化算法
  • ¥15 Python中的request,如何使用ssr节点,通过代理requests网页。本人在泰国,需要用大陆ip才能玩网页游戏,合法合规。
  • ¥100 为什么这个恒流源电路不能恒流?
  • ¥15 有偿求跨组件数据流路径图
  • ¥15 写一个方法checkPerson,入参实体类Person,出参布尔值