duankun9280 2016-12-06 01:52
浏览 54
已采纳

非贪婪的正则表达式匹配不同的行为

I found that non-greedy regex match only become non-greedy when anchoring to the front, not to the end:

$ echo abcabcabc | perl -ne 'print $1 if /^(a.*c)/'
abcabcabc
# OK, greedy match

$ echo abcabcabc | perl -ne 'print $1 if /^(a.*?c)/'
abc
# YES! non-greedy match

Now look at this, when anchoring to the end:

$ echo abcabcabc | perl -ne 'print $1 if /(a.*c)$/'
abcabcabc
# OK, greedy match

$ echo abcabcabc | perl -ne 'print $1 if /(a.*?c)$/'
abcabcabc
# what, non-greedy become greedy?

why is that? how come it doesn't print abc as before?

(The problem was found in my Go code, but illustrated in Perl for simplicity).

  • 写回答

1条回答 默认 最新

  • dpi9530 2016-12-06 02:53
    关注
    $ echo abcabcabc | perl -ne 'print $1 if /(a.*?c)$/'
    abcabcabc
    # what, non-greedy become greedy?
    

    Non-greedy means it'll match the fewest characters possible at the current location such that the entire pattern matches.

    After matching a at position 0, bcabcab is the least .*? can match at position 1 while still satisfying the rest of the pattern.

    "abcabcabc" = /a.*?c$/ in detail:

    1. At pos 0, a matches 1 char (a).
      1. At pos 1, .*? matches 0 chars (empty string).
        1. At pos 1, c fails to match. Backtrack!
      2. At pos 1, .*? matches 1 char (b).
        1. At pos 2, c matches 1 char (c).
          1. At pos 3, $ fails to match. Backtrack!
      3. At pos 1, .*? matches 2 chars (bc).
        1. At pos 1, c fails to match. Backtrack!
      4. ...
      5. At pos 1, .*? matches 7 chars (bcabcab).
        1. At pos 8, c matches 1 char (c).
          1. At pos 9, $ matches 0 chars (empty string). Match successful!

    "abcabcabc" = /a.*c$/ in detail (for contrast):

    1. At pos 0, a matches 1 char (a).
      1. At pos 1, .* matches 8 chars (abcabcabc).
        1. At pos 9, c fails to match. Backtrack!
      2. At pos 1, .* matches 7 chars (abcabcab).
        1. At pos 8, c matches 1 char (c).
          1. At pos 9, $ matches 0 chars (empty string). Match successful!

    Tip: Avoid patterns with two instances of a non-greediness modifier. Unless you are using them as an optimization, there's a good chance they can match something you don't want them to match. This is relevant here because patterns implicitly start with \G(?s:.*?)\K (unless cancelled out by a leading ^, \A or \G).

    What you want is one of the following:

    /a[^a]*c$/
    /a[^c]*c$/
    /a[^ac]*c$/
    

    You could also use one of the following:

    /a(?:(?!a).)c$/s
    /a(?:(?!c).)c$/s
    /a(?:(?!a|c).)c$/s
    

    It would be inefficient and unreadable to use these latter three in this situation, but they will work with boundaries that are longer than one character.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 安装svn网络有问题怎么办
  • ¥15 Python爬取指定微博话题下的内容,保存为txt
  • ¥15 vue2登录调用后端接口如何实现
  • ¥65 永磁型步进电机PID算法
  • ¥15 sqlite 附加(attach database)加密数据库时,返回26是什么原因呢?
  • ¥88 找成都本地经验丰富懂小程序开发的技术大咖
  • ¥15 如何处理复杂数据表格的除法运算
  • ¥15 如何用stc8h1k08的片子做485数据透传的功能?(关键词-串口)
  • ¥15 有兄弟姐妹会用word插图功能制作类似citespace的图片吗?
  • ¥15 latex怎么处理论文引理引用参考文献