douqiao7958 2016-01-28 11:10
浏览 13

正则表达式中的贪婪问题

In PHP, I'm matching the text here http://siba.thenetworksolution.it/allegati/H3018500D7FDDE9ACA05671F49F4F3746A69DAF96.1329514.pdf.txt with the following regex:

preg_match('#(.*(?s))(particella |particelle |p\.|part\.|p |part |mappale |mapp\.|mapp |n\.|\*)\s*(\d+[\d /\p{Pd}]*)($|.{0,20}(?s)(graffati|particella |particelle |p\.|.*part\.|p |part |mappale |mapp\.|mapp |n\.|subalterno |subalterni |sub\.|s\.|sub |s |\bcat\b|\bcategoria\b|\brendita\b|\bvani\b|\bconsistenza\b|\bR\.C\.\b))#i', $txt, $matches, PREG_OFFSET_CAPTURE, $offset)

with offset = 1155 (that is the offset of the word "foglio" in the text).

I expected them to match the 454 (that is just after the offset) but it matches 57/1998 instead (that is many rows after).

After some tests on regex101.com I discovered the issue is the carriage return between the prefix particella and 454, but I expected the \s to match line feeds.

How I can correct the greediness so the regex will match the 454?

  • 写回答

1条回答 默认 最新

  • duanjiwu0324 2016-01-28 11:37
    关注

    Solved. There was a space after particella in the second group.

    评论

报告相同问题?

悬赏问题

  • ¥20 易康econgnition精度验证
  • ¥15 线程问题判断多次进入
  • ¥15 msix packaging tool打包问题
  • ¥28 微信小程序开发页面布局没问题,真机调试的时候页面布局就乱了
  • ¥15 python的qt5界面
  • ¥15 无线电能传输系统MATLAB仿真问题
  • ¥50 如何用脚本实现输入法的热键设置
  • ¥20 我想使用一些网络协议或者部分协议也行,主要想实现类似于traceroute的一定步长内的路由拓扑功能
  • ¥30 深度学习,前后端连接
  • ¥15 孟德尔随机化结果不一致