I have a MySQL query that returns data for formatting to an XML file. One of the columns is a free text field that can contain strange characters that "breaks" the XML with an encoding error. I believe these characters are a strange " quotes that made it into a record from pasted Microsoft Word when the user originally input the record. I do not have control over that process.
Strange Character example:
â€œTURN KEY â€“ Totally Furnishedâ€
I am using htmlspecialchars to "clean" this data and it basically removes the field entirely from XML record and makes it blank for that record. This fixes the encoding issue but that record is now missing data for that field. I still want that data, I just want to omit or even change weird characters to something like a dash.
$description = htmlspecialchars($row['PropertyInformation'], ENT_QUOTES, 'UTF-8');
The XML output ends up like this in the records where the weird characters are occurring:
<DESCRIPTIF> <![CDATA[ ]]> </DESCRIPTIF>