dsrw29618 2014-07-03 07:19
浏览 39
已采纳

从一个数据库读取数据并使用PHP插入另一个数据库时出现编码错误

With PHP, I am trying to read data from Pervasive DB v9.5 and insert it to PostgreSQL 9.3 (encoding: UTF-8) on Windows 2008. I did not choose or code PervasiveDB (I am just reading data from it). With ODBC I read data from Pervasive and write it to console with no problem. However when I try to insert it to Postgre I encounter with

Warning: pg_execute(): Query failed: ERROR:  invalid byte sequence for encoding "UTF8": 0x94 in file.php on line ..

So, I saw that Postgres did not like the string I gave.

Then I use

var_dump(iconv_get_encoding('all'));

and see that my encoding is ISO-8859-1

and modify the string with

iconv ( 'ISO-8859-1' , 'UTF-8' , $a)

Now, the error is gone. However the string which reached to Postgres is not correct.

The code I used is below. And my test string is aöaçaşaıağaüaÖaÇaŞaİaĞaÜ

$a is the string which comes from Pervasive

echo $a; 

gives aöaçaşaıağaüaÖaÇaŞaİaĞaÜ

echo iconv ( 'ISO-8859-1' , 'UTF-8' , $a)

gives a┬öa┬ça┬şa┬ıa┬ğa┬üa┬Öa┬Ça┬Şa┬İa┬Ğa┬Ü

<?php
//var_dump(iconv_get_encoding('all'));

$conn = pg_connect("host=localhost port=5432 dbname=xxx user=xxx password=".$argv[1]);

$result = pg_prepare($conn, "my_query", 'SELECT * FROM func_my_deneme($1)');

$connect_string = "DRIVER={Pervasive ODBC Client Interface}; SERVERNAME=localhost; SERVERDSN=xxx;";
$pervasiveconn = odbc_connect($connect_string, 'xxx', 'xxx');

$pervasive_result = odbc_exec($pervasiveconn ,"SELECT something");

while(odbc_fetch_row($pervasive_result)){
  $a=odbc_result($pervasive_result,1);

  echo $a;

  $result = pg_execute($conn, "my_query", array(iconv ( 'ISO-8859-1' , 'UTF-8' , $a)));
}
?>
  • 写回答

1条回答 默认 最新

  • dongxiangshen7916 2014-07-03 07:54
    关注

    You only seem to be looking at one of the two encoding exchanges here.

    You have:

    (pervasive's native encoding) -> (PHP string)
    

    and

    (PHP string) -> (PostgreSQL)
    

    Of these, you're only explicitly handling the second. You're assuming that the data Pervasive's ODBC driver returns is in PHP's default encoding, which on your system is iso-8859-1.

    Your tests suggest that assumption may be correct, but simply echo'ing the string isn't a good way to tell, because that introduces another encoding step:

    (PHP string) -> (whatever decodes it for viewing)
    

    be that a web browser, terminal or whatever. If the viewer expects some encoding that happens to be the same as Pervasive is using it will decode the output corectly.

    Try:

    echo $a;
    echo "aöaçaşaıağaüaÖaÇaŞaİaĞaÜ";
    

    and make sure the viewer shows the same value for both. Make sure you edit your source file with the encoding set to iso-8859-1, not some other encoding, so that the literal bytes of the string you paste are correct.

    At that point you should get an error if your editor is set correctly because not all those characters are legal in iso-8859-1. The first invalid one is ş.

    So clearly what's coming from Pervasive can't be iso-8859-1. To really print a latin-1 string, you can echo the escaped bytes. For example, this string:

    aöaçaaaüaÖaÇaaaaÜ
    

    in which all chars are legal iso-8859-1, is printed in iso-8859-1 encoding with:

    echo "a\xf6a\xe7aaa\xfca\xd6a\xc7aaaa\xdc"
    

    Here, hex escapes are used to specify non-7-bit characters to unambiguously ensure that the encoding of the byte sequence is what you think without any confusion about text editors etc.

    Betcha that doesn't print right when you view it, because whatever's reading the input isn't decoding it as iso-8859-1.


    What you should be doing is looking at the bytes of the string you get from Pervasive to see what it really is. Then determining its encoding and decoding it into utf-8, which you can then send to PostgreSQL over a client_encoding = utf-8 connection. @deceze suggested bin2hex for this (I don't speak PHP, so didn't know what to suggest). So show the output of:

    echo bin2hex($a) . "
    ";
    

    Or - even better - make sure you determine from the configuration / documentation what the encoding of the data coming from Pervasive is, rather than guessing. Or just force it.

    A quick look at the Pervasive documentation showed that the ODBC Driver has an encoding parameter that takes the code page ID for the desired encoding. So try:

    $connect_string = "DRIVER={Pervasive ODBC Client Interface}; SERVERNAME=localhost; SERVERDSN=xxx; encoding=65001";
    

    (Microsoft, at least, defines 65001 as the codepage for utf-8 per this doc).

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 python的qt5界面
  • ¥15 无线电能传输系统MATLAB仿真问题
  • ¥50 如何用脚本实现输入法的热键设置
  • ¥20 我想使用一些网络协议或者部分协议也行,主要想实现类似于traceroute的一定步长内的路由拓扑功能
  • ¥30 深度学习,前后端连接
  • ¥15 孟德尔随机化结果不一致
  • ¥15 apm2.8飞控罗盘bad health,加速度计校准失败
  • ¥15 求解O-S方程的特征值问题给出边界层布拉休斯平行流的中性曲线
  • ¥15 谁有desed数据集呀
  • ¥20 手写数字识别运行c仿真时,程序报错错误代码sim211-100