dongliming2416 2015-06-23 00:47
浏览 88
已采纳

PHP输出编码与MySQL数据库中的UTF-8字符串有关

I know this question comes up in one form or another all the time on here, but I'm kind of at a loss on how to resolve it. I've got a PHP website that's running off of MySQL, that's showing some extended characters as a garbled mess. As far as I know it's all encoded as UTF-8, on every step from the content import to displaying it on the screen. Still, it's showing weird encoding issues. Here's the first test example (Natural Phënåm¥na, this is on purpose), which mb_detect_encoding identifies as UTF-8, which I can only get to display correctly with utf8_decode:

no utf8_decode: Natural Phënåm¥na
utf8_decode: Natural Phënåm¥na

Second example, which never even utf8_decodes properly (should be an ümlaut and “typographer's quotes” (extended characters added on purpose, as a test:

no utf8_decode: This pürson from “Vancouver, Canadaâ€
utf8_decode: This pürson from �??Vancouver, Canada�?�

My initial thought was it was doubly encoded, but I don't think that's what's going on. Everything is displaying correctly in MySQL when I do queries on the command line.

Here's a rundown of all the things I've investigated:

  • Content imported is verified to be UTF-8, imported with UTF-8 connection to MySQL
  • MySQL Database, tables, columns are UTF-8, utf_unicode_*
  • character_set_client, etc variables in MySQL set to utf8 on Amazon RDS
  • PHP PDO connection is UTF-8, NAME set to UTF-8
  • Both PHP header charset and HTML meta charset are UTF-8
  • mb_detect_encoding is returning UTF-8 for both strings

So after a few hours of troubleshooting, I'm kind of at a loss. On a whim I even tried setting the HTML header/meta and PHP headers to ISO-8559-1, but that's not doing the trick either.

I last spent a while battling with Amazon RDS to get the right variables set, but otherwise I'm out of ideas.

mysql> show variables like '%character%';
+--------------------------+-------------------------------------------+
| Variable_name            | Value                                     |
+--------------------------+-------------------------------------------+
| character_set_client     | utf8                                      |
| character_set_connection | utf8                                      |
| character_set_database   | utf8                                      |
| character_set_filesystem | utf8                                      |
| character_set_results    | utf8                                      |
| character_set_server     | utf8                                      |
| character_set_system     | utf8                                      |
| character_sets_dir       | /rdsdbbin/mysql-5.5.40.R1/share/charsets/ |
+--------------------------+-------------------------------------------+

So I'm wondering, are there steps I'm missing? Something obvious? Thanks in advance.

UPDATE

Here's my PHP output script, for further clarification on the "output" that I mentioned:

<?php header("Content-type: text/html; charset=utf-8"); ?>
<html>
<header>
    <meta charset="utf-8" />
    <title>My test</title>
</header>
    <body>
<?php


    try {
        $dbh = new PDO("mysql:host=localhost;dbname=database", 
        "user", "password", array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8"));
    }
    catch(PDOException $e) {
        echo $e->getMessage();
    }

    $sth = $dbh->prepare("my select statement");
$sth->execute();
$rows = $sth->fetchAll(PDO::FETCH_ASSOC);


foreach ($rows as $row) {
    echo mb_detect_encoding($row['name']);
    echo "<br>no utf8 decode: ". $row['name'] . "<br>
";
    echo "single utf8 decode: ". utf8_decode($row['name']) . "<br>
";
    echo "no utf8 decode: ". $row['description'] . "<br>
";
    echo "single utf8 decode: ". (utf8_decode($row['description'])) . "<br>
";
}

?>
</body>
</html>

UPDATE #2 I tried also just outputting these same characters into the browser directly from a PHP echo, and straight static HTML, and the characters display perfectly fine.

echo "“test ü ö”<br>"; ?>
<p>“test ü ö”</p>
  • 写回答

3条回答 默认 最新

  • dongye9820 2015-06-24 20:31
    关注

    You should not change all the character_set% fields, just the three that are affected by SET NAMES utf8;.

    Don't use utf8_encode or decode.

    You have probably messed up when storing.

    This seems to recover the characters, but this not a viable fix:

    CONVERT(CAST(CONVERT('pürson from “Vancouver, Canadaâ€' USING latin1)
                 AS BINARY)
            USING utf8)
    --> 'pürson from “Vancouver, Canada - spec',
    

    In order to figure out what was done, please provide

    SELECT col, HEX(col) FROM tbl WHERE ...
    

    for some cell that is not rendering properly.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥15 2024-五一综合模拟赛
  • ¥15 下图接收小电路,谁知道原理
  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?
  • ¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
  • ¥15 手机接入宽带网线,如何释放宽带全部速度
  • ¥30 关于#r语言#的问题:如何对R语言中mfgarch包中构建的garch-midas模型进行样本内长期波动率预测和样本外长期波动率预测
  • ¥15 ETLCloud 处理json多层级问题
  • ¥15 matlab中使用gurobi时报错
  • ¥15 这个主板怎么能扩出一两个sata口
  • ¥15 不是,这到底错哪儿了😭