douzhan1935 2013-06-05 05:13
浏览 289
已采纳

正则表达式:如何匹配任何字符串,直到空格,或直到标点后跟空格?

I'm trying to write a regular expression which will find URLs in a plain-text string, so that I can wrap them with anchor tags. I know there are expressions already available for this, but I want to create my own, mostly because I want to know how it works.

Since it's not going to break anything if my regex fails, my plan is to write something fairly simple. So far that means: 1) match "www" or "http" at the start of a word 2) keep matching until the word ends.

I can do that, AFAICT. I have this: \b(http|www).?[^\s]+

Which works on foo www.example.com bar http://www.example.com etc.

The problem is that if I give it foo www.example.com, http://www.example.com it thinks that the comma is a part of the URL.

So, if I am to use one expression to do this, I need to change "...and stop when you see whitespace" to "...and stop when you see whitespace or a piece of punctuation right before whitespace". This is what I'm not sure how to do.

At the moment, a solution I'm thinking of running with is just adding another test – matching the URL, and then on the next line moving any sneaky punctuation. This just isn't as elegant.

Note: I am writing this in PHP.

Aside: why does replacing \s with \b in the expression above not seem to work?


ETA:

Thanks everyone!

This is what I eventually ended up with, based on Explosion Pills's advice:

function add_links( $string ) {
    function replace( $arr ) {
        if ( strncmp( "http", $arr[1], 4) == 0 ) {
            return "<a href=$arr[1]>$arr[1]</a>$arr[2]$arr[3]";
        } else {
            return "<a href=" . "http://" . $arr[1] . ">$arr[1]</a>$arr[2]$arr[3]";
        }
    }
return preg_replace_callback( '/\b((?:http|www).+?)((?!\/)[\p{P}]+)?(\s|$)/x', replace, $string );
}

I added a callback so that all of the links would start with http://, and did some fiddling with the way it handles punctuation.

It's probably not the Best way to do things, but it works. I've learned a lot about this in the last little while, but there is still more to learn!

  • 写回答

4条回答 默认 最新

  • dsegw3424 2013-06-05 05:30
    关注
    preg_replace('/
        \b       # Initial word boundary
        (        # Start capture
        (?:      # Non-capture group
        http|www # http or www (alternation)
        )        # end group
        .+?      # reluctant match for at least one character until...
        )        # End capture
        (        # Start capture
        [,.]+    # ...one or more of either a comma or period.
                 # add more punctuation as needed
        )?       # End optional capture
        (\s|$) # Followed by either a space character or end of string
        /x', '<a href="\1">\1</a>\2\3'
    

    ...is probably what you are going for. I think it's still imperfect, but it should at least work for your needs.

    Aside: I think this is because \b matches punctuation too

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(3条)

报告相同问题?

悬赏问题

  • ¥15 如何在scanpy上做差异基因和通路富集?
  • ¥20 关于#硬件工程#的问题,请各位专家解答!
  • ¥15 关于#matlab#的问题:期望的系统闭环传递函数为G(s)=wn^2/s^2+2¢wn+wn^2阻尼系数¢=0.707,使系统具有较小的超调量
  • ¥15 FLUENT如何实现在堆积颗粒的上表面加载高斯热源
  • ¥30 截图中的mathematics程序转换成matlab
  • ¥15 动力学代码报错,维度不匹配
  • ¥15 Power query添加列问题
  • ¥50 Kubernetes&Fission&Eleasticsearch
  • ¥15 報錯:Person is not mapped,如何解決?
  • ¥15 c++头文件不能识别CDialog