I know this question comes up in one form or another all the time on here, but I'm kind of at a loss on how to resolve it. I've got a PHP website that's running off of MySQL, that's showing some extended characters as a garbled mess. As far as I know it's all encoded as UTF-8, on every step from the content import to displaying it on the screen. Still, it's showing weird encoding issues. Here's the first test example (Natural Phënåm¥na, this is on purpose), which mb_detect_encoding
identifies as UTF-8, which I can only get to display correctly with utf8_decode
:
no utf8_decode: Natural Phënåm¥na
utf8_decode: Natural Phënåm¥na
Second example, which never even utf8_decodes properly (should be an ümlaut and “typographer's quotes” (extended characters added on purpose, as a test:
no utf8_decode: This pürson from “Vancouver, Canadaâ€
utf8_decode: This pürson from �??Vancouver, Canada�?�
My initial thought was it was doubly encoded, but I don't think that's what's going on. Everything is displaying correctly in MySQL when I do queries on the command line.
Here's a rundown of all the things I've investigated:
- Content imported is verified to be UTF-8, imported with UTF-8 connection to MySQL
- MySQL Database, tables, columns are UTF-8, utf_unicode_*
- character_set_client, etc variables in MySQL set to utf8 on Amazon RDS
- PHP PDO connection is UTF-8, NAME set to UTF-8
- Both PHP header charset and HTML meta charset are UTF-8
- mb_detect_encoding is returning UTF-8 for both strings
So after a few hours of troubleshooting, I'm kind of at a loss. On a whim I even tried setting the HTML header/meta and PHP headers to ISO-8559-1, but that's not doing the trick either.
I last spent a while battling with Amazon RDS to get the right variables set, but otherwise I'm out of ideas.
mysql> show variables like '%character%';
+--------------------------+-------------------------------------------+
| Variable_name | Value |
+--------------------------+-------------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | utf8 |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /rdsdbbin/mysql-5.5.40.R1/share/charsets/ |
+--------------------------+-------------------------------------------+
So I'm wondering, are there steps I'm missing? Something obvious? Thanks in advance.
UPDATE
Here's my PHP output script, for further clarification on the "output" that I mentioned:
<?php header("Content-type: text/html; charset=utf-8"); ?>
<html>
<header>
<meta charset="utf-8" />
<title>My test</title>
</header>
<body>
<?php
try {
$dbh = new PDO("mysql:host=localhost;dbname=database",
"user", "password", array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8"));
}
catch(PDOException $e) {
echo $e->getMessage();
}
$sth = $dbh->prepare("my select statement");
$sth->execute();
$rows = $sth->fetchAll(PDO::FETCH_ASSOC);
foreach ($rows as $row) {
echo mb_detect_encoding($row['name']);
echo "<br>no utf8 decode: ". $row['name'] . "<br>
";
echo "single utf8 decode: ". utf8_decode($row['name']) . "<br>
";
echo "no utf8 decode: ". $row['description'] . "<br>
";
echo "single utf8 decode: ". (utf8_decode($row['description'])) . "<br>
";
}
?>
</body>
</html>
UPDATE #2 I tried also just outputting these same characters into the browser directly from a PHP echo, and straight static HTML, and the characters display perfectly fine.
echo "“test ü ö”<br>"; ?>
<p>“test ü ö”</p>