This sounds like a job for the Tokenizer!
You can fetch all of the parsed tokens from a PHP source file using token_get_all
. You can then go through the resulting array, evaluating each token one at a time. The token name comes back as a number you can look up using token_name
.
A small demo at the PHP interactive prompt:
php > $str = '<?php echo $face[fire]; echo $face[\'fire\']; ?>';
php > $t = token_get_all($str);
php > foreach($t as $i => $j) { if(is_array($j)) $t[$i][0] = token_name($j[0]); }
And here's the output in a different code block, as it's a bit tall and it'll be good to reference the source string while scrolling through it.
php > print_r($t);
Array
(
[0] => Array
(
[0] => T_OPEN_TAG
[1] => <?php
[2] => 1
)
[1] => Array
(
[0] => T_ECHO
[1] => echo
[2] => 1
)
[2] => Array
(
[0] => T_WHITESPACE
[1] =>
[2] => 1
)
[3] => Array
(
[0] => T_VARIABLE
[1] => $face
[2] => 1
)
[4] => [
[5] => Array
(
[0] => T_STRING
[1] => fire
[2] => 1
)
[6] => ]
[7] => ;
[8] => Array
(
[0] => T_WHITESPACE
[1] =>
[2] => 1
)
[9] => Array
(
[0] => T_ECHO
[1] => echo
[2] => 1
)
[10] => Array
(
[0] => T_WHITESPACE
[1] =>
[2] => 1
)
[11] => Array
(
[0] => T_VARIABLE
[1] => $face
[2] => 1
)
[12] => [
[13] => Array
(
[0] => T_CONSTANT_ENCAPSED_STRING
[1] => 'fire'
[2] => 1
)
[14] => ]
[15] => ;
[16] => Array
(
[0] => T_WHITESPACE
[1] =>
[2] => 1
)
[17] => Array
(
[0] => T_CLOSE_TAG
[1] => ?>
[2] => 1
)
)
As you can see, our evil array indexes are a T_VARIABLE
followed by an open bracket, then a T_STRING
that is not quoted. Single-quoted indexes come through as T_CONSTANT_ENCAPSED_STRING
, quotes and all.
With this knowledge in hand, you can go through the list of tokens and actually rewrite the source to eliminate all of the unquoted array indexes -- most of them should be pretty obvious. You can simply add single quotes around the string when you write the file back out.
Just keep in mind that you'll want to not quote any numeric indexes, as that will surely have undesirable side-effects.
Also keep in mind that expressions are legal inside of indexes:
$pathological[ some_function('Oh gods', 'why me!?') . '4500' ] = 'Teh bad.';
You'll have a teeny tiny, slightly harder time dealing with these with an automated tool. By which I mean trying to handle them may cause you to fly into a murderous rage. I suggest only trying to fix the constant/string problem now. If done correctly, you should be able to get the Notice count down to a more manageable level.
(Also note that the Tokenizer deals with the curly string syntax as an actual token, T_CURLY_OPEN
-- this should make those pesky inlined array indexes easier to deal with. Here's the list of all tokens once again, just in case you missed it.)