We have a requirement to remove special characters from text strings. For example, we may get a string that looks like this; the ®
is the registered trademark symbol:
PEPSI® Bottle 20 oz<br><br>
I'm not great with regex, and can't figure out how to edit the existing code to produce that.
Here's what we currently have:
$ui = "PEPSI Bottle 20 oz<br><br>";
$ui = preg_replace('/[^A-Za-z0-9\.\' -]/', '', $ui);
This results in PEPSI174 Bottle 20 ozbrbr
.
Our desired result is PEPSI Bottle 20 oz<br><br>
.
How can I edit the regex to make sure that
- It doesn't remove valid HTML tags like
<br>
, and - If it does find a special character entity, it removes not only the special characters (the & and #), but also the numbers and semicolon?
We don't want to have it remove all the numbers, as obviously the string can contain numbers; it's only numbers that are part of the entity code that we need to remove.