doulu1914 2014-05-28 02:22
浏览 10

嵌套贪心量词不匹配

I have noticed some strange behaviour with a PCRE regular expression I can't explain. I would expect the code:

preg_match('!^.+?(?:/programs/([^?#]+))?.*?$!',
    'http://example.com/programs/drive', $matches);

to return "drive" as match 1. The [^?#]+ and the ? after the non-capturing group are both greedy so why doesn't the [^?#]+ take precedence and match drive? Instead testing revealed that the .+? at the start matches the h and the .*? at the end matches the rest of the URL.

By contrast, the code:

preg_match('!^.+?(?:/programs/([^?#]+).*)?$!',
     'http://example.com/programs/drive', $matches);

works as expected and returns drive as match 1.

  • 写回答

1条回答 默认 最新

  • duanjiagu0655 2014-05-28 03:04
    关注

    Whats happining is this. The first .+? is applied at the start of the string before the h in http. This is lazy so it gives up right off the bat and the (?:/programs/([^?#]+).*)? is tested against the h. This whole expression is optional so it, too, gives up after failing to match at the start of the string. Finally, the .*?$ at the end of the pattern is applied, and this expression is able to match all the characters in the string for a successful match.

    评论

报告相同问题?