REGEX（PCRE）仅在零或一次匹配时匹配

I have the following problem.

Let's take the input (wikitext)

======hello((my first program)) world======

I want to match "hello", "my first program" and " world" (notice the space).

But for the input:

======hello(my first program)) world======

I want to match "hello(my first program" and " world".

In other words, I want to match any letters, spaces and additionally any single symbols (no double or more).

This should be done with the unicode character properties like \p{L}, \p{S} or \p{Z}, as documented here.

Any ideas?

Addendum 1

The regex has just to stop before any double symbol or punctuation in unicode terms, that is, before any \p{S}{2,} or \p{P}{2,}.

I'm not trying to parse the whole wikitext with this, read my question carefully. The regex I'm looking for IS for the lexer I'm working on, and making it match such inputs will simplify my parser incredibly.

Addendum 2

The pattern must work with preg_match(). I can imagine how I'd have to split it first. Perhaps it would use some lookahead, I don't know, I've tried everything that I could imagine.

Using only preg_match() is a requirement set in stone by the current implementation of the lexer. It must be that way, because that's the natural way of how lexers work: they match sequences in the input stream.

写回答
好问题 0 提建议
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dp518158 2010-11-11 19:17
关注
return preg_split('/([\pS\pP])\\1+/', $theString);

Result: http://www.ideone.com/YcbIf

(You need to get rid of the empty strings manually.)

Edit: as a preg_match regex:

'/(?:^|([\pS\pP])\\1+)((?:[^\pS\pP]|([\pS\pP])(?!\\3))*)/'

take the 2nd capture group when it is matched. Example: http://www.ideone.com/ErTVA

But you could just consume ([\pS\pP])\\1+ and discard, or if doesn't match, consume (?:[^\pS\pP]|([\pS\pP])(?!\\3))* and record, since your lexer is going to use more than 1 regex anyway?
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(1条)

报告相同问题？

关注问题

regex-parser:用于PCRE正则表达式的AST
2021-04-27 14:35

RegexParser是PCRE regex的解析器。它产生一个代表您的正则表达式的AST。它可以帮助您生成一些与您的正则表达式匹配的输入。安装它可以在： composer install robinbressan/regex-parser 用法要构建AST，您...
php正则表达式匹配mac,多功能正则表达式开发工具RegExRX for Mac
2021-04-18 08:24

李忻扬的博客原标题：多功能正则表达式开发工具RegExRX for MacRegExRX是一个完整的正则表达式开发工具，适合新手和专业人士使用，该编辑器具有许多旨在帮助开发和存储正则表达式的功能。RegPCRX基于PCRE库，使用户可以制作与...
php代码-在线 php 正则表达式在线测试，php正则测试，在线php正则匹配
2021-07-14 18:53

它可以替换一个或多个匹配项，并返回处理后的字符串。 3. `preg_split`：这个函数根据一个模式将字符串分割成数组。二、正则表达式语法正则表达式由特殊字符和普通字符组成，其中特殊字符包括元字符，如`.`、`^`...
php匹配 %3c %3e,php匹配中文标点后续，错误PCRE does not support \L, \l, \N, \P,
2021-04-22 03:06

武藤杰洛特的博客网上很多写的匹配中文用\u，实际上会报错。具体的google一下就行了。然后，网上资料是将\u改为\x但这个时候，又会报preg_replace(): Compilation failed: character value in \x{} or \o{} is too large而且，还没...
php 正则匹配单词,PHP 正则表达式入门 Getting Started with PHP Regular Expressions
2021-04-12 15:57

汇商的博客 Last-Modified: 2019年5月10日16:23:19译者注:本文是面向0正则基础的phper, 很多正则的高级使用都... 什么是正则表达式正则表达式(regex 或 regexp)的主要目的在于有效地搜索给定文本中的模式. 这些搜索模式使用正...
59、PHP正则表达式：匹配、优化与应用实例
2025-12-02 02:41

lg888的博客内容涵盖匹配嵌套括号、CSV和XML/HTML结构验证等复杂场景，详细解析了递归表达式、占有量词及S模式修饰符的使用方法。同时介绍了正则编译缓存、基准测试等效率优化策略，并提供了实际代码示例。文章还总结了常见元...
php 正则匹配账户,php – 正则表达式匹配名称和可选值
2021-05-08 02:03

weixin_39878688的博客我有一个PHP应用程序,它与支付处理器连接,以处理信用卡.有时,来自处理器的后响应失败(例如矩阵中的短暂故障),并且我们没有得到付款的自动通知.在这些情况下,我们会回退到始终发送的确认电子邮件中输入数据.我希望我...
php 匹配多个正则表达式,PHP 正则表达式函数库(两套)
2021-04-08 10:54

聿子先生的博客 PHP 正则表达式函数库(两套)更新时间：2009年10月14日 17:20:57 作者：正则表达式：用于描述字符排列和匹配模式的一种语法规则。它主要用于字符串的模式分割、匹配、查找及替换操作。在PHP中有两套正则表达式函数库...
php正则匹配n个数字,正则表达式第n个匹配
2021-04-29 03:28

weixin_39961636的博客我意识到这可能看起来像一个愚蠢的请求，但我会问无论如何。正则表达式第n个匹配我想用正则表达式来查找号码列表中的每个第n个逗号，即：88574,93243,129659,135504,136357,141052,141619,141619,142195,144622,...
pcre-8.42.zip
2018-04-10 11:37

- **回溯机制**：PCRE在处理复杂的正则表达式时采用回溯算法，确保能正确匹配多种可能的情况。 - **Unicode支持**：PCRE库不仅支持ASCII字符集，还全面支持Unicode字符，包括字符类别、范围和命名的Unicode属性。 ...
没有解决我的问题, 去提问

REGEX（PCRE）仅在零或一次匹配时匹配

2条回答 默认 最新

2条回答默认最新