duankun9280 2016-12-06 01:52
浏览 54
已采纳

非贪婪的正则表达式匹配不同的行为

I found that non-greedy regex match only become non-greedy when anchoring to the front, not to the end:

$ echo abcabcabc | perl -ne 'print $1 if /^(a.*c)/'
abcabcabc
# OK, greedy match

$ echo abcabcabc | perl -ne 'print $1 if /^(a.*?c)/'
abc
# YES! non-greedy match

Now look at this, when anchoring to the end:

$ echo abcabcabc | perl -ne 'print $1 if /(a.*c)$/'
abcabcabc
# OK, greedy match

$ echo abcabcabc | perl -ne 'print $1 if /(a.*?c)$/'
abcabcabc
# what, non-greedy become greedy?

why is that? how come it doesn't print abc as before?

(The problem was found in my Go code, but illustrated in Perl for simplicity).

  • 写回答

1条回答 默认 最新

  • dpi9530 2016-12-06 02:53
    关注
    $ echo abcabcabc | perl -ne 'print $1 if /(a.*?c)$/'
    abcabcabc
    # what, non-greedy become greedy?
    

    Non-greedy means it'll match the fewest characters possible at the current location such that the entire pattern matches.

    After matching a at position 0, bcabcab is the least .*? can match at position 1 while still satisfying the rest of the pattern.

    "abcabcabc" = /a.*?c$/ in detail:

    1. At pos 0, a matches 1 char (a).
      1. At pos 1, .*? matches 0 chars (empty string).
        1. At pos 1, c fails to match. Backtrack!
      2. At pos 1, .*? matches 1 char (b).
        1. At pos 2, c matches 1 char (c).
          1. At pos 3, $ fails to match. Backtrack!
      3. At pos 1, .*? matches 2 chars (bc).
        1. At pos 1, c fails to match. Backtrack!
      4. ...
      5. At pos 1, .*? matches 7 chars (bcabcab).
        1. At pos 8, c matches 1 char (c).
          1. At pos 9, $ matches 0 chars (empty string). Match successful!

    "abcabcabc" = /a.*c$/ in detail (for contrast):

    1. At pos 0, a matches 1 char (a).
      1. At pos 1, .* matches 8 chars (abcabcabc).
        1. At pos 9, c fails to match. Backtrack!
      2. At pos 1, .* matches 7 chars (abcabcab).
        1. At pos 8, c matches 1 char (c).
          1. At pos 9, $ matches 0 chars (empty string). Match successful!

    Tip: Avoid patterns with two instances of a non-greediness modifier. Unless you are using them as an optimization, there's a good chance they can match something you don't want them to match. This is relevant here because patterns implicitly start with \G(?s:.*?)\K (unless cancelled out by a leading ^, \A or \G).

    What you want is one of the following:

    /a[^a]*c$/
    /a[^c]*c$/
    /a[^ac]*c$/
    

    You could also use one of the following:

    /a(?:(?!a).)c$/s
    /a(?:(?!c).)c$/s
    /a(?:(?!a|c).)c$/s
    

    It would be inefficient and unreadable to use these latter three in this situation, but they will work with boundaries that are longer than one character.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥20 机器学习能否像多层线性模型一样处理嵌套数据
  • ¥20 西门子S7-Graph,S7-300,梯形图
  • ¥50 用易语言http 访问不了网页
  • ¥50 safari浏览器fetch提交数据后数据丢失问题
  • ¥15 matlab不知道怎么改,求解答!!
  • ¥15 永磁直线电机的电流环pi调不出来
  • ¥15 用stata实现聚类的代码
  • ¥15 请问paddlehub能支持移动端开发吗?在Android studio上该如何部署?
  • ¥20 docker里部署springboot项目,访问不到扬声器
  • ¥15 netty整合springboot之后自动重连失效