You need to match a series of four numbers that are not preceded by a dash:
/[^-](\d{4})/
Decomposing the regex:
-
[^-]
: not a dash
-
\d{4}
: four digits
-
(\d{4})
: capture the digits
You can then add .pdf
to get your file name.
Example with preg_replace
and the file names you've given above in an array:
foreach ($files as $f) {
echo "$f => " . preg_replace("/.*?[^-]*(\d{4}).+/", "$1.pdf", $f) . PHP_EOL;
}
ETA: if you want to factor in the page number, you could use this code:
foreach ($files as $f) {
# this saves the four digits of the PDF name, and the number in p1/p2
preg_match("/.*?[^-]*(\d{4}).*?p(\d+)\.pdf/i", $f, $matches);
# if the number (from p1/p2) is greater than 1, add it to the PDF name number
if ($matches[2] > 1) {
$matches[1] += $matches[2] - 1;
}
# format the pdf name to be four digits long, with zero padding for shorter names
echo "$f => " . sprintf('%04d.pdf', $matches[1]) . PHP_EOL;
}
Output:
14-5678_jobname_0123_.p1.PDF => 0123.pdf
14-5678_jobname_0123_.p2.PDF => 0124.pdf
14-5678_jobname_0125_.p1.PDF => 0125.pdf
Weired_filename_0123_bla_14-5678_jobname.p1.PDF => 0123.pdf
Weired_filename_0123_bla_14-5678_jobname.p2.PDF => 0124.pdf
Weired_filename_0125_bla_14-5678_jobname.p1.PDF => 0125.pdf
14-5678_jobname_0123.p1.PDF => 0123.pdf
14-5678_jobname_0123.p2.PDF => 0124.pdf
14-5678_jobname_0125.p1.PDF => 0125.pdf
0123_14-5678_jobname.p1.PDF => 0123.pdf
0123_14-5678_jobname.p2.PDF => 0124.pdf
0125_14-5678_jobname.p1.PDF => 0125.pdf
jobname_0123_14-5678.p1.PDF => 0123.pdf
jobname_0123_14-5678.p2.PDF => 0124.pdf
jobname_0125_14-5678.p1.PDF => 0125.pdf