This is a sentence sanitizer.
function sanitize_sentence($string) {
$pats = array(
'/([.!?]\s{2}),/', # Abc. ,Def
'/\.+(,)/', # ......,
'/(!|\?)\1+/', # abc!!!!!!!!, abc?????????
'/\s+(,)/', # abc , def
'/([a-zA-Z])\1\1/'); # greeeeeeen
$fixed = preg_replace($pats,'$1',$string); # apply pats
$fixed = preg_replace('/(?:(?<=\s)|^)[^a-z0-9]+(?:(?=\s)|$)/i', '',$fixed); # bad chunks
$fixed = preg_replace( '/([!?,.])(\S)/', '$1 $2', $fixed); # spaces after punctuation, if it doesn't exist already
$fixed = preg_replace( '/[^a-zA-Z0-9!?.]+$/', '.', $fixed); # end of string must end in period
$fixed = preg_replace('/,(?!\s)/',', ',$fixed); # spaces after commas
return $fixed;
This is the test sentence:
hello [[[[[[]]]]]] friend.....? how are you [}}}}}}
It should return:
hello friend.....? how are you
But instead it is returning:
hello friend. .. .. ? how are you.
So there are 2 problems and I can't find a solution around them:
- the set of periods are being separated into ".. .. ." for some reason. They should remain as "....." next to the question mark.
- the end of the string must end in a period only and only if there is at least one of these characters anywhere in the string: !?,. (if at least one of those characters are not found in the string, that preg_replace should not be executed)
Examples for the second problem:
This sentence doesn't need an ending period because the mentioned characters are nowhere to be found
This other sentence, needs it! Why? Because it contains at least one of the mentioned characters
(of course, the ending period should only be placed if it doesn't exist yet)
Thanks for your help!