douyu4535 2012-08-21 21:52
浏览 283
已采纳

PHP和Informix:CLIENT_LOCALE和DB_LOCALE无法按预期工作 - 编码相关

I am using the PHP PDO_Informix driver v1.2.7 and the Informix client version is 3.70. I have some code in UTF-8 that makes queries to a Latin1 database (the Informix server is 9.21).

The thing is that the driver is chopping some values of the return strings. It's like special characters counts double. If a column 'name' has type varchar(2) and the value of name is 'áa' the value returned when queried is 'á' instead of 'áa'. If I resize the column to varchar(3) the result is correct. Below I attach a short script to reproduce the bug. I included the DSN so you can see the encoding settings.

Test script:

$dsn = "informix:database=base;server=ol_server;host=192.168.123.123;client_locale=en_us.utf8;db_locale=en_us.819;service=1526;protocol=olsoctcp;EnableScrollableCursors=1";
$db = new \PDO($dsn, 'user', 'pass');
$db->exec("CREATE TABLE ticket82 ( name VARCHAR(2) );");
$db->exec("INSERT INTO ticket82 VALUES ('aa');");

$statement = $db->query("select name from ticket82;");
$value = $statement->fetchAll(\PDO::FETCH_ASSOC);
echo "expected 'aa' got '{$value[0]['NAME']}'
";

$db->exec("update ticket82 set name='áa';");
$statement = $db->query("select name from ticket82;");
$value = $statement->fetchAll(\PDO::FETCH_ASSOC);
echo "expected 'áa' got '{$value[0]['NAME']}'
";

$db->exec("ALTER TABLE ticket82 MODIFY (name varchar(3));");
$statement = $db->query("select name from ticket82;");
$value = $statement->fetchAll(\PDO::FETCH_ASSOC);
echo "expected 'áa' got '{$value[0]['NAME']}'
";

$db->exec("DROP TABLE ticket82;");

Expected result:

expected 'aa' got 'aa'
expected 'áa' got 'áa'
expected 'áa' got 'áa'

Actual result:

expected 'aa' got 'aa'
expected 'áa' got 'á'
expected 'áa' got 'áa'

Any ideas?

  • 写回答

1条回答 默认 最新

  • duanke9540 2012-08-22 14:40
    关注

    In a slightly weird way, I think that is the 'expected' or 'working as designed' behaviour.

    The column size is specified in bytes rather than characters, but for the database code set (ISO 8859-1 aka Latin-1) there is no difference. The client-side code (PDO Informix) assumes that the variable holding it should allow for the same number of bytes storage.

    However, the client-side code set is UTF-8 rather than 8859-1, and some of the character codes for 8859-1 characters require 2 bytes in UTF-8. To be precise, the 'ASCII' range U+0000..U+007F require 1 byte in UTF-8, but the 'accented' range U+0080..U+00FF require 2 bytes. Because the client-side has limited its variables to 2 bytes (rather than 2 characters), you will only be able to select a single accented character from a VARCHAR(2) column.

    The codeset conversion between UTF-8 and 8859-1 occurs in a library called GLS (Global Language Support) inside the Informix ClientSDK (CSDK) code that is used by PDO Informix.

    This is an interesting setup with the client and database server using different code sets. There's room to think that the client could usefully use bigger variable sizes when there is a code set conversion going on. Since the database is storing Latin-1, all the characters fall in the Unicode range U+0000..U+00FF. (If it was Latin-15, the Euro symbol € U+20AC requires 3 bytes in UTF-8, for instance; most of the other 8859-x series code sets require one or two bytes per character, I believe.) Handling that sensibly in the codeset conversion environment would require some care, but could be done if the code were aware of the issue. The fix probably belongs in PDO Informix. It is telling the CSDK how much space to use for storing the data, using the byte-count information provided by CSDK and the Informix server.


    FYI: Informix 9.21 has been out of support for a long time now (so has 9.30, 9.40 and 10.00 — even 11.10 is out of support, though that is a relatively recent change). However, that is not a factor in this problem.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 linux驱动,linux应用,多线程
  • ¥20 我要一个分身加定位两个功能的安卓app
  • ¥15 基于FOC驱动器,如何实现卡丁车下坡无阻力的遛坡的效果
  • ¥15 IAR程序莫名变量多重定义
  • ¥15 (标签-UDP|关键词-client)
  • ¥15 关于库卡officelite无法与虚拟机通讯的问题
  • ¥15 目标检测项目无法读取视频
  • ¥15 GEO datasets中基因芯片数据仅仅提供了normalized signal如何进行差异分析
  • ¥100 求采集电商背景音乐的方法
  • ¥15 数学建模竞赛求指导帮助