if-else在递归正则表达式中没有按预期工作

I am using a regex to parse some BBCode, so the regex has to work recursively to also match tags inside others. Most of the BBCode has an argument, and sometimes it's quoted, though not always.

A simplified equivalent of the regex I'm using (with html style tags to reduce the escaping needed) is this:

'~<(\")?a(?(1)\1)> #Match the tag, and require a closing quote if an opening one provided
  ([^<]+ | (?R))* #Match the contents of the tag, including recursively
</a>~x'

However, if I have a test string that looks like this:

<"a">Content<a>Also Content</a></a>

it only matches the <a>Also Content</a> because when it tries to match from the first tag, the first matching group, \1, is set to ", and this is not overwritten when the regex is run recursively to match the inner tag, which means that because it isn't quoted, it doesn't match and that regex fails.

If instead I consistently either use or don't use quotes, it works fine, but I can't be sure that that will be the case with the content that I have to parse. Is there any way to work around this?

The full regex that I'm using, to match [spoiler]content[/spoiler], [spoiler=option]content[/spoiler] and [spoiler="option"]content[/spoiler], is

"~\[spoiler\s*+ #Match the opening tag
            (?:=\s*+(\"|\')?((?(1)(?!\\1).|[^\]]){0,100})(?(1)\\1))?+\s*\] #If an option exists, match that
          (?:\ *(?:
|<br />))?+ #Get rid of an extra new line before the start of the content if necessary
          ((?:[^\[
]++ #Capture all characters until the closing tag
            |
(?!\[spoiler]) Capture new line separately so backtracking doesn't run away due to above
            |\[(?!/?spoiler(?:\s*=[^\]*])?) #Also match all tags that aren't spoilers
            |(?R))*+) #Allow the pattern to recurse - we also want to match spoilers inside spoilers,
                     # without messing up nesting
          
? #Get rid of an extra new line before the closing tag if necessary
          \[/spoiler] #match the closing tag
         ~xi"

There are a couple of other bugs with it as well though.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dongyuanliao6204 2015-06-27 12:58
关注
The simplest solution is to use alternatives instead:

<(?:a|"a")> ([^<]++ | (?R))* </a>

But if you really don't want to repeat that a part, you can do the following:

<("?)a\1> ([^<]++ | (?R))* </a>

Demo

I've just put the conditional ? inside the group. This time, the capturing group always matches, but the match can be empty, and the conditional isn't necessary anymore.

Side note: I've applied a possessive quantifier to [^<] to avoid catastrophic backtracking.

In your case I believe it's better to match a generic tag than a specific one. Match all tags, and then decide in your code what to do with the match.

Here's a full regex:

\[ (?<tag>\w+) \s* (?:=\s* (?: (?<quote>["']) (?<arg>.{0,100}?) \k<quote> | (?<arg>[^\]]+) ) )? \] (?<content> (?:[^[]++ | (?R) )*+ ) \[/\k<tag>\]

Demo

Note that I added the J option (PCRE_DUPNAMES) to be able to use (?<arg>...) twice.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(1条)

报告相同问题？

关注问题

if-else在递归正则表达式中没有按预期工作 php
2015-06-27 12:49

回答 2 已采纳 The simplest solution is to use alternatives instead: <(?:a|"a")> ([^<]++ | (?R))* &
if-else和递归调用 c语言
2022-06-06 19:54

回答 1 已采纳 if(n==1) 都是不细心
用递归和正则表达式替换字符串中的文本 php
2019-07-21 19:47

回答 1 已采纳 Instead of looking for the perfect RegEx I suggest looking into using preg_replace_callback(). It
剑指Offer--053-正则表达式匹配
2016-05-30 23:21

CHENG Jian的博客牛客OJ：正则表达式匹配九度OJ：http://ac.jobdu.com/problem.php?pid=1508 GitHub代码： 052-正则表达式匹配 CSDN题解：剑指Offer–052-正则表达式匹配牛客OJ 九度OJ CSDN题解 GitHub代码 052-正则...
处理递归的正则表达式模式构造 php
2014-05-12 18:54

回答 2 已采纳 Preface: For the sake of explanation, I decided to clarify your "labels" by preceding them with a
使用正则表达式递归替换匹配标记 php
2011-05-13 01:17

回答 2 已采纳 You can't do this with regular expressions. You need to write a parser! So create a stack (an arr
在php中递归创建关联数组 - 使用echo“工作” php
2016-01-17 15:06

回答 1 已采纳 Try the following: public function getChildren($id=0) { $modulesResult = $this->find('all'
php 匹配多个正则表达式,PHP 正则表达式函数库(两套)
2021-04-08 10:54

聿子先生的博客 PHP 正则表达式函数库(两套)更新时间：2009年...在PHP中有两套正则表达式函数库，两者功能相似，只是执行效率略有差异：一套是由PCRE(Perl Compatible Regular Expression)库提供的。使用“preg_”为前缀命名的函数...
正则加递归去除指定的多个字符
2018-09-03 01:16

回答 4 已采纳试一下这个 ``` public static void main(String[] args) { // 按指定模式在字符串查找 String line
if与else if 的问题，递归求解公约数
2016-11-19 07:46

回答 2 已采纳其实都不对，应该是 return Gcd(a - b, b); 别的类似
PHP - 多维数组递归 php
2016-10-17 09:52

回答 2 已采纳 When looping through your array, use a for loop so you can easily manipulate indexes: for($i = 0;
php 正则表达式 递归,正则如何递归匹配大括号？
2021-04-21 02:45

师纪瑞的博客我看题主的问题中虽然提到了递归，但其实只是说想要函数的大括号包裹的部分，好像并没有提到要把里面的if, foreach之类的语法也要分析出来，所以如果只是希望一个简单的实现的话，这样如何$raw = <<require '....
Javascript - 将动态递归PHP值作为javascript函数传递 javascript php
2016-07-27 15:57

回答 2 已采纳 You don't pass this.value, but this to a function which accepts one attribute. The name of the var
python 正则表达式包含变量_python正则表达式里引入变量
2020-12-05 11:15

weixin_39955355的博客【Python】正则表达式中使用变量我们有时想把变量放进正则表达式中来匹配想要的结果.Python中使用 re.compile(r''+变量+''),其中正则表达式中的“变量”应为字符串形式. import re regex_test_ ... python 里内嵌...
5.Nginx-rewrite 正则表达式
2022-07-25 17:30

是个笨小孩的博客现在Nginx已经成为很多公司作为前端反向代理(proxypass)服务器的首选，在实际工作中往往会遇到很多跳转（重写URL）的需求。比如，更换域名后需要保持旧的域名能跳转到新的域名上、某网页发生改变需要跳转到新的页面...
没有解决我的问题, 去提问

悬赏问题

¥15 什么设备可以研究OFDM的60GHz毫米波信道模型
¥15 不知道是该怎么引用多个函数片段
¥15 爬取1-112页所有帖子的标题但是12页后要登录后才能我使用selenium模拟登录账号密码输入后会报错不知道怎么弄了
¥30 关于用python写支付宝扫码付异步通知收不到的问题
¥50 vue组件中无法正确接收并处理axios请求
¥15 隐藏系统界面pdf的打印、下载按钮
¥15 基于pso参数优化的LightGBM分类模型
¥15 安装Paddleocr时报错无法解决
¥15 python中transformers可以正常下载，但是没有办法使用pipeline
¥50 分布式追踪trace异常问题

if-else在递归正则表达式中没有按预期工作

2条回答 默认 最新

悬赏问题

2条回答默认最新