doubei5310 2019-06-16 14:01
浏览 42
已采纳

如何在PHP中捕获与可选空格的链接? [重复]

This question already has an answer here:

From a file_get_contents I get the HTML code of a url.

$html = file_get_contents($url);

Now I would like to capture the href link.

The HTML code is:

<li class="four-column mosaicElement">
<a href="https://example.com" title="Lorem ipsum">
...
</a>
</li>
<li class="four-column mosaicElement">
<a href="https://example.org" title="Lorem ipsum">
...
</a>
</li>

So I'm using this:

preg_match_all('/class=\"four-column mosaicElement\"><a href=\"(.+?)\" title=\"(.+?)"/m', $html, $urls, PREG_SET_ORDER, 0);

foreach ($urls as $key => $url) {
    echo $url[1];
}

How do I solve this problem?

</div>
  • 写回答

3条回答 默认 最新

  • douren2831 2019-06-16 15:35
    关注

    Here, we can also use an expression with positive lookahead and optional spaces, just in case,

    (?=class="four-column mosaicElement")[\s\S]*?href="\s*(https?[^\s]+)\s*"
    

    and our desired URLs are in this group:

    (https?[^\s]+)
    

    DEMO

    TEST

    $re = '/(?=class="four-column mosaicElement")[\s\S]*?href="\s*(https?[^\s]+)\s*"/m';
    $str = '<li class="four-column mosaicElement">
    <a href="https://example.com" title="Lorem ipsum">
    ...
    </a>
    </li>
    <li class="four-column mosaicElement">
    <a href="https://example.org" title="Lorem ipsum">
    
    <li class="four-column mosaicElement">
    <a href="   https://example.org   " title="Lorem ipsum">
    
    <li class="four-column mosaicElement">
    <a href="   https://example.org                " title="Lorem ipsum">
    ...
    </a>
    </li>
    ';
    
    preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
    
    foreach ($matches as $key => $url) {
        echo $url[1] . "
    ";
    }
    

    Output

    https://example.com
    https://example.org
    https://example.org
    https://example.org
    

    RegEx Circuit

    jex.im visualizes regular expressions:

    enter image description here

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥100 求数学坐标画圆以及直线的算法
  • ¥100 c语言,请帮蒟蒻写一个题的范例作参考
  • ¥15 名为“Product”的列已属于此 DataTable
  • ¥15 安卓adb backup备份应用数据失败
  • ¥15 eclipse运行项目时遇到的问题
  • ¥15 关于#c##的问题:最近需要用CAT工具Trados进行一些开发
  • ¥15 南大pa1 小游戏没有界面,并且报了如下错误,尝试过换显卡驱动,但是好像不行
  • ¥15 自己瞎改改,结果现在又运行不了了
  • ¥15 链式存储应该如何解决
  • ¥15 没有证书,nginx怎么反向代理到只能接受https的公网网站