如何扩展正则表达式以查找多个匹配项?

这是我当前的正则表达式(用于解析 iCal </ strong>文件):</ p >

 <代码> /(?:(=(?:?[^ “] ”[^ “] *”) [^“] * $))(*。?)(  [\ w \ W] *)/ 
</ code> </ pre>

使用 preg_match()</ code>的当前输出是:</ p>
\ n

  //输出1  - preg_match()
Array

[0] =&gt; TZID =“格林威治标准时间:都柏林;爱丁堡;里斯本;伦敦”
[1] =&gt; VALUE = DATE; RSVP = FALSE; LANGUAGE = en-gb

</ code> </ pre>

我想扩展我的正则表达式输出这个(即找到多个 匹配):</ p>

  //输出2 
Array

[0] =&gt; TZID =“格林威治标准时间:都柏林;爱丁堡;里斯本;伦敦” n [1] =&gt; VALUE = DATE
[2] =&gt; RSVP = FALSE
[3] =&gt; LANGUAGE = en-gb

</ code> </ pre>

n

正则表达式应搜索未包含在带引号子字符串中的每个分号,并将其作为匹配项提供。</ p>


不能只交换到 preg_match_all ()</ code> as给出不需要的</ strong >输出</ p>

  //输出3  - preg_match_all()
Array

[0] =&gt; 数组

[0] =&gt; TZID =“格林威治标准时间:都柏林;爱丁堡;里斯本;伦敦”; VALUE = DATE; RSVP = FALSE; LANGUAGE = en-gb

[1 ] =&gt; 数组

[0] =&gt; TZID =“格林威治标准时间:都柏林;爱丁堡;里斯本;伦敦”

[2] =&gt; 数组

[0] =&gt; VALUE = DATE; RSVP = FALSE; LANGUAGE = en-gb


</ </ code> </ pre>
</ div>

展开原文

原文

This is my current regex (used in parsing an iCal file):

/(.*?)(?:;(?=(?:[^"]*"[^"]*")*[^"]*$))([\w\W]*)/

The current output using preg_match() is this:

//Output 1 - `preg_match()`
Array
(
    [0] => TZID="Greenwich Mean Time:Dublin; Edinburgh; Lisbon; London"
    [1] => VALUE=DATE;RSVP=FALSE;LANGUAGE=en-gb
)

I would like to extend my regex to output this (i.e. find multiple matches):

//Output 2
Array
(
    [0] => TZID="Greenwich Mean Time:Dublin; Edinburgh; Lisbon; London"
    [1] => VALUE=DATE
    [2] => RSVP=FALSE
    [3] => LANGUAGE=en-gb
)    

The regex should search for each semicolon not contained within a quoted substring and provide that as a match.


Cannot just swap to preg_match_all() as gives this unwanted output

//Output 3 - `preg_match_all()`
Array
(
    [0] => Array
        (
            [0] => TZID="Greenwich Mean Time:Dublin; Edinburgh; Lisbon; London";VALUE=DATE;RSVP=FALSE;LANGUAGE=en-gb
        )

    [1] => Array
        (
            [0] => TZID="Greenwich Mean Time:Dublin; Edinburgh; Lisbon; London"
        )

    [2] => Array
        (
            [0] => VALUE=DATE;RSVP=FALSE;LANGUAGE=en-gb
        )

)

dpp66953
dpp66953 我知道-这就是我问这个问题的原因
5 年多之前 回复
dongruyan4948
dongruyan4948 你需要更新你的正则表达式
5 年多之前 回复
doufu7464
doufu7464 不像简单地交换到preg_match_all那么简单
5 年多之前 回复
douyue1926
douyue1926 preg_match_all
5 年多之前 回复

3个回答



 (。+?)(?:;(?=(?:[^“] ”[^“]  *“) [^”] * $)| $)
</ code> </ pre>

试试这个。参见演示。</ p>

https://regex101.com/r/pG1kU1/18 </ p>
</ div>

展开原文

原文

(.+?)(?:;(?=(?:[^"]*"[^"]*")*[^"]*$)|$)

Try this.See demo.

https://regex101.com/r/pG1kU1/18

dqwh0109
dqwh0109 谢谢@vks - 感谢您的意见。 新问题,因为解析器的另一部分使用explode(),我认为它应该更智能并使用正则表达式。 你的答案再次运作良好
5 年多之前 回复



您需要使用 preg_match_all </ code>来获取字符串的所有匹配项。 </ p>

您使用的模式并非旨在获得多个结果,因为 [\ w \ W] * </ code>匹配所有内容直到字符串结尾。


但它只是你的一个问题,像这样设计的模式需要检查(对于每个冒号)引用的数量是奇数还是偶数直到文件的结尾!:(?=(?:[^ “] ”[^“] *”) [^“] * $)</ code>。想象一下,使用此前瞻解析整个字符串的次数。</ p>

为了避免这个问题,您可以使用不尝试查找冒号的不同方法,但尝试描述不</ strong>冒号的所有内容:所以您正在查找不包含文本的所有部分 无论内容是什么,都包含引号或冒号+引用的部分。</ p>

您可以使用这种模式:</ p>

  $ pattern ='  〜[^

“个;] +(?:”[^ “\] (?:\\ [^” \] *) “[^

” 个;] 〜';

if(preg_match_all($ pattern,$ str,$ matches))
print_r($ matches [0]);
</ code> </ pre>

模式细节:</ p>

 〜  #pattern delimiter 
[^

“;] +#”#所有不是换行符,双引号或冒号
(?:#非捕获组:包含最终引用的部分
“ #“#a literal quote
[^”\] #“#所有不是引号或反斜杠
(?:\\。[^”\] *)
#“# 用于处理转义字符的可选组
“#”#
[^

“;] #”#
#重复零次或多次
~
</ code> </ pre >

演示</ strong> </ p> \ n </ div>

展开原文

原文

You need to use preg_match_all to get all the match of the string.

The pattern you use isn't designed to get several results since [\w\W]* matches everything until the end of the string.
But it's only one of your problems, a pattern designed like this need to check (for each colon) if the number of quotes is odd or even until the end of the file!: (?=(?:[^"]*"[^"]*")*[^"]*$). Imagine a minute how many times the whole string is parsed with this lookahead.

To avoid the problem, you can use a different approach that doesn't try to find colons, but that tries to describe everything that is not a colon: So you are looking for every parts of text that doesn't contains quotes or colon + quoted parts whatever the content.

You can use this kind of pattern:

$pattern = '~[^
";]+(?:"[^"\\\]*(?:\\\.[^"\\\]*)*"[^
";]*)*~';

if (preg_match_all($pattern, $str, $matches))
    print_r($matches[0]);

pattern details:

~           # pattern delimiter
[^
";]+  #" # all that is not a newline, a double quote or a colon
(?:         # non-capturing group: to include eventual quoted parts
    "                  #"# a literal quote
    [^"\\\]*           #"# all that is not a quote or a backslash
    (?:\\\.[^"\\\]*)*  #"# optional group to deal with escaped characters
    "                  #"#
    [^
";]*         #"# 
)*          # repeat zero or more times 
~

demo

douyinliu8813
douyinliu8813 运作良好,智能答案 - upvoting。 谢谢你的帮助
5 年多之前 回复



您可以使用以下内容进行匹配:</ p>

 (。*?(?  :; | $))(?![^“] *”)
</ code> </ pre>

参见 DEMO </ p>

或拆分:</ p>

 ;(?![^”  ] *)
</ code> </ pre>

请参阅 DEMO </ a> </ p>
</ div>

展开原文

原文

You can use the following to match:

(.*?(?:;|$))(?![^"]*")

See DEMO

or split by:

;(?![^"]*")

See DEMO

Csdn user default icon
上传中...
上传图片
插入图片
抄袭、复制答案,以达到刷声望分或其他目的的行为,在CSDN问答是严格禁止的,一经发现立刻封号。是时候展现真正的技术了!
立即提问
相关内容推荐