正则表达式：如何匹配任何字符串，直到空格，或直到标点后跟空格？

I'm trying to write a regular expression which will find URLs in a plain-text string, so that I can wrap them with anchor tags. I know there are expressions already available for this, but I want to create my own, mostly because I want to know how it works.

Since it's not going to break anything if my regex fails, my plan is to write something fairly simple. So far that means: 1) match "www" or "http" at the start of a word 2) keep matching until the word ends.

I can do that, AFAICT. I have this: \b(http|www).?[^\s]+

Which works on foo www.example.com bar http://www.example.com etc.

The problem is that if I give it foo www.example.com, http://www.example.com it thinks that the comma is a part of the URL.

So, if I am to use one expression to do this, I need to change "...and stop when you see whitespace" to "...and stop when you see whitespace or a piece of punctuation right before whitespace". This is what I'm not sure how to do.

At the moment, a solution I'm thinking of running with is just adding another test – matching the URL, and then on the next line moving any sneaky punctuation. This just isn't as elegant.

Note: I am writing this in PHP.

Aside: why does replacing \s with \b in the expression above not seem to work?

ETA:

Thanks everyone!

This is what I eventually ended up with, based on Explosion Pills's advice:

function add_links( $string ) {
    function replace( $arr ) {
        if ( strncmp( "http", $arr[1], 4) == 0 ) {
            return "<a href=$arr[1]>$arr[1]</a>$arr[2]$arr[3]";
        } else {
            return "<a href=" . "http://" . $arr[1] . ">$arr[1]</a>$arr[2]$arr[3]";
        }
    }
return preg_replace_callback( '/\b((?:http|www).+?)((?!\/)[\p{P}]+)?(\s|$)/x', replace, $string );
}

I added a callback so that all of the links would start with http://, and did some fiddling with the way it handles punctuation.

It's probably not the Best way to do things, but it works. I've learned a lot about this in the last little while, but there is still more to learn!

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

4条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dsegw3424 2013-06-05 05:30
关注
preg_replace('/ \b # Initial word boundary ( # Start capture (?: # Non-capture group http|www # http or www (alternation) ) # end group .+? # reluctant match for at least one character until... ) # End capture ( # Start capture [,.]+ # ...one or more of either a comma or period. # add more punctuation as needed )? # End optional capture (\s|$) # Followed by either a space character or end of string /x', '<a href="\1">\1</a>\2\3'

...is probably what you are going for. I think it's still imperfect, but it should at least work for your needs.

Aside: I think this is because \b matches punctuation too
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(3条)

报告相同问题？

关注问题

正则表达式：如何匹配任何字符串，直到空格，或直到标点后跟空格？ php
2013-06-05 05:13

回答 4 已采纳 preg_replace('/ \b # Initial word boundary ( # Start capture (?: # N
正则表达式匹配不包含某个字符串的字符串 python 正则表达式
2021-03-07 09:46

回答 2 已采纳。。。 import re l = [] res = re.findall('ABC.*?BCD', r'ABC/dABC/213BCD/sfoajs/ABC/dddd/BCD') fo
Golang中的正则表达式：如何设置使字符串不匹配的字符？
2016-07-30 19:59

回答 1 已采纳 Updated Go does not support lookbehinds. So a workaround can be: (?:\A|(?:[^\\]+|\A)(\\{2})+|[^\
字符串文本匹配神器———Java正则表达式
2021-06-28 17:53

活跃的咸鱼的博客 正则表达式是一种特殊的字符串模式，用于匹配一组字符串，就好比用模具做产品，而正则就是这个模具，定义一种规则去匹配符合规则的字符。为什么要学正则表达式 对于正则表达式，相信很多人都知道，但是很多人的第...
正则表达式不包含特定字符串？正则表达式
2018-11-17 12:31

回答 2 已采纳 ``` ^((?!test)[A-Za-z])*$ ```
PHP正则表达式：查找字符串中的所有连续数字序列？ php
2018-10-05 13:13

回答 3 已采纳 With RegEx, you can use: (123(?:4(?:5(?:6(?:7(?:89?)?)?)?)?)?|234(?:5(?:6(?:7(?:89?)?)?)?)?|345(?
正则表达式怎么表示匹配的字符串中不能包含某个字串？ java
2020-04-15 11:09

回答 2 已采纳 var reg = /^($/ ^符号可以匹配开头 $字符可以锁定结尾或者试试 /^[ ]$/
最全常用正则表达式大全
2021-07-12 15:41

清风一宿的博客一、校验数字的表达式 1. 数字：^[0-9]*$ 2. n位的数字：^\d{n}$ 3. 至少n位的数字：^\d{n,}$ 4. m-n位的数字：^\d{m,n}$ 5. 零和非零开头的数字：^(0|[1-9][0-9]*)$ 6. 非零开头的最多带两位小数的数字：^...
正则表达式：匹配所有内容：直到或<br /> php
2014-05-22 15:10

回答 1 已采纳 RegEx: ([^:]+:)\s*(.*?)(<br\s*/>|\R) \1 :: \2 :: \3 PHP: $row['text'] = preg_replace('
始终使用正则表达式在两个字符之间添加空格 php
2018-04-26 14:51

回答 1 已采纳 It doesn't work as you expect it because the first two { characters match the regex, the replaceme
PHP正则表达式：零个或多个空格不起作用 php symfony
2018-04-02 20:14

回答 1 已采纳 You may use '~(?:^|,)\s*+[^#@]~' Here, the + symbol defines a *+ possessive quantifier matching
正则表达式系列之 —— 字符类
2022-08-05 17:08

LeviDing的博客本号是《现代 JavaScript 教程》[1]官方微信公众号字符类考虑一个实际的任务 —— 我们有一个电话号码，例如 "+7(903)-123-45-...字符类（Character classes）是一种特殊的符号，匹配特定集合中的任何符号。首先，...
求正则表达式：匹配不包含"E2564"、"E2462"的字符串 linux 正则表达式
2020-12-14 08:58

回答 23 已采纳 .*(E(?![2564|2462])).* 你这个的效果是匹配有E的，E后面不能跟2 5 6 4 | 这5个字符（[]里面的东西）。和你之前说的 “E2564” 与 “E2462”是一个
正则表达式
2022-07-22 15:13

或许想学习的博客 正则表达式使用单个字符串来描述、匹配一系列匹配某个句法规则的字符串。？：通配符匹配文件名中的 0 个或 1 个字符 *：通配符匹配零个或多个字符 +：通配符匹配1个或多个字符简单的实例： ^[0-9]+abc$ ^ 为匹配...
Swift和正则表达式：语法
2020-06-12 06:44

cunjie3951的博客简而言之，正则表达式（简称regexes或regexp）是指定字符串模式的一种方式。您无疑熟悉您喜欢的文本编辑器或IDE中的搜索和替换功能。您可以搜索确切的单词和短语。您还可以激活选项，例如不区分大小写，以便搜索...
没有解决我的问题, 去提问

悬赏问题

¥15 如何在scanpy上做差异基因和通路富集？
¥20 关于#硬件工程#的问题，请各位专家解答！
¥15 关于#matlab#的问题：期望的系统闭环传递函数为G(s)=wn^2/s^2+2¢wn+wn^2阻尼系数¢=0.707，使系统具有较小的超调量
¥15 FLUENT如何实现在堆积颗粒的上表面加载高斯热源
¥30 截图中的mathematics程序转换成matlab
¥15 动力学代码报错，维度不匹配
¥15 Power query添加列问题
¥50 Kubernetes&Fission&Eleasticsearch
¥15 報錯：Person is not mapped，如何解決？
¥15 c++头文件不能识别CDialog

正则表达式：如何匹配任何字符串，直到空格，或直到标点后跟空格？

4条回答 默认 最新

悬赏问题

4条回答默认最新