In PHP, I'm matching the text here http://siba.thenetworksolution.it/allegati/H3018500D7FDDE9ACA05671F49F4F3746A69DAF96.1329514.pdf.txt with the following regex:
preg_match('#(.*(?s))(particella |particelle |p\.|part\.|p |part |mappale |mapp\.|mapp |n\.|\*)\s*(\d+[\d /\p{Pd}]*)($|.{0,20}(?s)(graffati|particella |particelle |p\.|.*part\.|p |part |mappale |mapp\.|mapp |n\.|subalterno |subalterni |sub\.|s\.|sub |s |\bcat\b|\bcategoria\b|\brendita\b|\bvani\b|\bconsistenza\b|\bR\.C\.\b))#i', $txt, $matches, PREG_OFFSET_CAPTURE, $offset)
with offset = 1155
(that is the offset of the word "foglio" in the text).
I expected them to match the 454
(that is just after the offset) but it matches 57/1998
instead (that is many rows after).
After some tests on regex101.com I discovered the issue is the carriage return between the prefix particella
and 454
, but I expected the \s
to match line feeds.
How I can correct the greediness so the regex will match the 454
?