Most likely, your string contains characters encoded using the UTF-8
character set. UTF-8 has some multibyte characters. For example, the
Euro symbol €
is represented in UTF-8 with the three bytes E2,
82, AC
.
But your software is interpreting the string using a one-byte
encoding, such as ISO-8859-1. This causes each byte of the 3-byte
character to be interpreted as a separate character. E2
, for
example, is being displayed as â
, when it is actually only the
first byte of a 3-byte character.
utf8_encode() is not the solution to this. It takes an ISO-8859-1
encoded string and returns a UTF-8 string. You already have a UTF-8
string.
You have a couple of options.
One, fix whatever uses the string so that it expects the string to
contain UTF-8. That will properly preserve the characters that are
in the string. For example, if you are writing the string as part of
a web page, ensure that the webpage's character encoding is UTF-8.
Two, convert the string to whatever encoding you are actually using.
For example, you can convert the string from UTF-8 to ISO-8859-1
with utf_decode(). The disadvantage is that ISO-8859-1 cannot
represent as many different characters as UTF-8, so some characters
will simply be lost in the decoding.