douzhi2988 2015-09-07 21:49
浏览 63
已采纳

如何支持emojis蓬勃发展?

I am using flourishlib for a website. My client requested that we should be able to use emojis with mobile phones. In theory we should change the character-encoding from utf8 to utf8mb4 for the MySQL database.

So far, so good, however, if we make this switch, like this:

# For each database:
ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE utf8mb4_unicode_ci;
# For each table:
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
# For each column:
ALTER TABLE table_name CHANGE column_name column_name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
# (Don’t blindly copy-paste this! The exact statement depends on the column type, maximum length, and other properties. The above line is just an example for a `VARCHAR` column.)

Then each character will use four bytes instead of three bytes. This would increase the database's size with 33%. This would result in worse performance and more storage space used up. So, as a result, we have decided to switch to an encoding of utf8mb4 for only specific columns of specific tables.

To make sure everything is all right, I have checked several things. Among them, I have checked flourishlib and found a few suspect parts:

  1. There is an fUTF8 class, which does not seem to support utf8mb4

  2. At fDatabase I am quoting some findings:

    if ($this->connection && function_exists('mysql_set_charset') && !mysql_set_charset('utf8', $this->connection)) {
        throw new fConnectivityException(
            'There was an error setting the database connection to use UTF-8'
        );
    }
    //...
    // Make MySQL act more strict and use UTF-8
    if ($this->type == 'mysql') {
        $this->execute("SET SQL_MODE = 'REAL_AS_FLOAT,PIPES_AS_CONCAT,ANSI_QUOTES,IGNORE_SPACE'");
        $this->execute("SET NAMES 'utf8'");
        $this->execute("SET CHARACTER SET utf8");
    }
    
  3. At fSQLSchemaTranslation I can see this:

    $sql = preg_replace('#\)\s*;?\s*$#D', ')ENGINE=InnoDB, CHARACTER SET utf8', $sql);
    

I have the suspicion that flourishlib will not support our quest of making a few columns of a few table have a character encoding of utf8mb4. I wonder whether we can upgrade something somehow to make this support. As a worst-case scenario, we can override every textual occurrence of utf8 to utf8mb4. However, that would be a very ugly hack and we wonder whether there is a better solution. Should we make this hack or is there a more orthodox approach?

  • 写回答

2条回答 默认 最新

  • dongsu0308 2015-09-09 13:14
    关注

    I have resolved the issue. I have altered the tables where I wanted to support emojis by changing the column character set and collation, like this:

    ALTER TABLE table_name CHANGE column_name column_name text CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
    

    After that, I had to make a few ugly hacks to make flourishlib able to support emojis.

    fDatabase.php:

    line 685:

            if ($this->connection && function_exists('mysql_set_charset') && !mysql_set_charset('utf8mb4', $this->connection)) {
                throw new fConnectivityException(
                    'There was an error setting the database connection to use UTF-8'
                );
            }
    

    line 717 stays the same, everything crashes if this line is changed:

    if ($this->connection && function_exists('mysqli_set_charset') && !mysqli_set_charset($this->connection, 'utf8')) {
    

    line 800:

        // Make MySQL act more strict and use UTF-8
        if ($this->type == 'mysql') {
            $this->execute("SET SQL_MODE = 'REAL_AS_FLOAT,PIPES_AS_CONCAT,ANSI_QUOTES,IGNORE_SPACE'");
            $this->execute("SET NAMES 'utf8mb4'");
            $this->execute("SET CHARACTER SET utf8mb4");
        }
    

    fSQLSchemaTranslation.php:

    line 1554:

    $sql = preg_replace('#\)\s*;?\s*$#D', ')ENGINE=InnoDB, CHARACTER SET utf8mb4', $sql);
    

    fXML.php:

    line 403:

        if (preg_replace('#[^a-z0-9]#', '', strtolower($encoding)) == 'utf8mb4') {
            // Remove the UTF-8 BOM if present
            $xml = preg_replace("#^\xEF\xBB\xBF#", '', $xml);
            fCore::startErrorCapture(E_NOTICE);
            $cleaned = self::iconv('UTF-8', 'UTF-8', $xml);
            if ($cleaned != $xml) {
                $xml = self::iconv('Windows-1252', 'UTF-8', $xml);
            }
            fCore::stopErrorCapture();
        }
    

    and finally, when there are modifications for any of the columns affected, I execute this:

    App::db()->query("set names 'utf8mb4'");
    

    which, essentially triggers the ->query() execution of an fDatabase object.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥20 删除和修改功能无法调用
  • ¥15 kafka topic 所有分副本数修改
  • ¥15 小程序中fit格式等运动数据文件怎样实现可视化?(包含心率信息))
  • ¥15 如何利用mmdetection3d中的get_flops.py文件计算fcos3d方法的flops?
  • ¥40 串口调试助手打开串口后,keil5的代码就停止了
  • ¥15 电脑最近经常蓝屏,求大家看看哪的问题
  • ¥60 高价有偿求java辅导。工程量较大,价格你定,联系确定辅导后将采纳你的答案。希望能给出完整详细代码,并能解释回答我关于代码的疑问疑问,代码要求如下,联系我会发文档
  • ¥50 C++五子棋AI程序编写
  • ¥30 求安卓设备利用一个typeC接口,同时实现向pc一边投屏一边上传数据的解决方案。
  • ¥15 SQL Server analysis services 服务安装失败