dsa111111 2016-03-26 09:36
浏览 117
已采纳

PHP正则表达式匹配特定的URL并剥离其他URL

I wrote this function to convert all specific URLs(mywebsite.com) to links, and strip other URLs to @@@spam@@@.

function get_global_convert_all_urls($content) {
  $content = strtolower($content);
  $replace = "/(?:http|https)?(?:\:\/\/)?(?:www.)?(([A-Za-z0-9-]+\.)*[A-Za-z0-9-]+\.[A-Za-z]+)(?:\/.*)?/im";
  preg_match_all($replace, $content, $search);
  $total = count($search[0]);
  for($i=0; $i < $total; $i++) {
  $url = $search[0][$i];
    if(preg_match('/mywebsite.com/i', $url)) {
      $content = str_replace($url, '<a href="'.$url.'">'.$url.'</a>', $content);            
    } else {
      $content = str_replace($url, '@@@spam@@@', $content); 
    }
  } 

  return $content;
}

The only problem that i can't solve is, the regex not ending on space if 2 URLs in one line.

$content = "http://www.mywebsite.com/index.html http://www.others.com/index.html";

Result:

<a href="http://www.mywebsite.com/index.html http://www.others.com/index.html">http://www.mywebsite.com/index.html http://www.others.com/index.html</a>

How can i get the result below:

<a href="http://www.mywebsite.com/index.html">http://www.mywebsite.com/index.html</a> @@@spam@@@   

I have tried add this (\s|$) at the ending of regex but no luck:

/(?:http|https)?(?:\:\/\/)?(?:www.)?(([A-Za-z0-9-]+\.)*[A-Za-z0-9-]+\.[A-Za-z]+)(?:\/.*)?(\s|$)/im
  • 写回答

3条回答 默认 最新

  • douzhulv1699 2016-03-26 09:51
    关注

    Edited based on change in your question.

    The problem is your .* at the end of your regex, so my suggestion is to replace it with a more precise expression. I cooked this up real quick, you'll want to some tests to verify your cases. =)

    $matches = null;
    $returnValue = preg_match_all('!(?:http|https)?(?:\\:\\/\\/)?(?:www.)?(([A-Za-z0-9-]+\\.)*[A-Za-z0-9-]+\\.[A-Za-z]+)(:[a-zA-Z0-9]*)?/?([a-zA-Z0-9\\-\\._\\?\\,\\\'/\\\\\\+&%\\$#\\=~])*[^\\.\\,\\)\\(]!', 'mywebsite.com/index.html others.com/index.html', $matches);
    

    Results in:

    array (
      0 => 
      array (
        0 => 'mywebsite.com/index.html ',
        1 => 'others.com/index.html',
      ),
      1 => 
      array (
        0 => 'mywebsite.com',
        1 => 'others.com',
      ),
      2 => 
      array (
        0 => '',
        1 => '',
      ),
      3 => 
      array (
        0 => '',
        1 => '',
      ),
      4 => 
      array (
        0 => 'l',
        1 => 'm',
      ),
    )
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥15 如何让企业微信机器人实现消息汇总整合
  • ¥50 关于#ui#的问题:做yolov8的ui界面出现的问题
  • ¥15 如何用Python爬取各高校教师公开的教育和工作经历
  • ¥15 TLE9879QXA40 电机驱动
  • ¥20 对于工程问题的非线性数学模型进行线性化
  • ¥15 Mirare PLUS 进行密钥认证?(详解)
  • ¥15 物体双站RCS和其组成阵列后的双站RCS关系验证
  • ¥20 想用ollama做一个自己的AI数据库
  • ¥15 关于qualoth编辑及缝合服装领子的问题解决方案探寻
  • ¥15 请问怎么才能复现这样的图呀