I have a string of the form:
Some Text[Opening]Really Really Long Text...[Closing]More Text[Closing]Even More Text
I want to extract Really Really Long Text... from the string with a regular expression. Up until the first [Closing].
If I do a regular expression like this:
$pMatch = "'\[Opening\](.+)\[Closing\]'si";
That gives me:
Really Really Long Text...[Closing]More Text
I can also make it not greedy like this:
$pMatch = "'\[Opening\](.+?)\[Closing\]'si";
Which works and gives me the correct output:
Really Really Long Text...
However, if I replace "Really Really Long Text..." with actual really really long text, it doesn't work and instead I receive a PREG_BACKTRACK_LIMIT_ERROR. I don't get an error if I use the greedy regular expression. I just get the wrong output as in the first case.
I've been working with regular expressions for a while, but this one has me stumped. Is there a way to get this to work with a regular expression or is regular expression not suitable for this task?
Here is PHP code to reproduce the issue:
<?php
$sShortString = "Some Text[Opening]Really Really Long Text...[Closing]More Text[Closing]Even More Text";
$sLongString = "Some Text[Opening]".str_repeat("BLAH", 1000000)."[Closing]More Text[Closing]Even More Text";
$pGreedyMatch = "'\[Opening\](.+)\[Closing\]'si";
$pNonGreedyMatch = "'\[Opening\](.+?)\[Closing\]'si";
header("Content-Type: text/plain");
if (preg_match($pGreedyMatch, $sShortString, $aMatch)) {
echo "Greedy Match:
";
print_r($aMatch);
}
if (preg_match($pNonGreedyMatch, $sShortString, $aMatch)) {
echo "Non-Greedy Match:
";
print_r($aMatch);
}
if (preg_match($pGreedyMatch, $sLongString, $aMatch)) {
echo "Greedy Match:
";
echo "Length: ".strlen($aMatch[1])."
";
}
if (preg_match($pNonGreedyMatch, $sLongString, $aMatch)) {
echo "Non-Greedy Match:
";
echo strlen($aMatch[1]);
} else {
echo "Non-Greedy Doesn't Match!
";
}
$iLastError = preg_last_error();
if ($iLastError == PREG_BACKTRACK_LIMIT_ERROR) {
echo "It's because the backtrack limit was exceeded!
";
}
?>
I get the output:
Greedy Match:
Array
(
[0] => [Opening]Really Really Long Text...[Closing]More Text[Closing]
[1] => Really Really Long Text...[Closing]More Text
)
Non-Greedy Match:
Array
(
[0] => [Opening]Really Really Long Text...[Closing]
[1] => Really Really Long Text...
)
Greedy Match:
Length: 4000018
Non-Greedy Doesn't Match!
It's because the backtrack limit was exceeded!
I've got it working by using the greedy regular expression and using additional code to strip off the text from [Closing] onward. I would like to better understand what's happening behind the scenes, why it needs to do so much backtracking, and if there's a way that the regular expression can be modified so it performs the task.
I really appreciate any insight!