dsrw29618 2014-07-03 07:19
浏览 39
已采纳

从一个数据库读取数据并使用PHP插入另一个数据库时出现编码错误

With PHP, I am trying to read data from Pervasive DB v9.5 and insert it to PostgreSQL 9.3 (encoding: UTF-8) on Windows 2008. I did not choose or code PervasiveDB (I am just reading data from it). With ODBC I read data from Pervasive and write it to console with no problem. However when I try to insert it to Postgre I encounter with

Warning: pg_execute(): Query failed: ERROR:  invalid byte sequence for encoding "UTF8": 0x94 in file.php on line ..

So, I saw that Postgres did not like the string I gave.

Then I use

var_dump(iconv_get_encoding('all'));

and see that my encoding is ISO-8859-1

and modify the string with

iconv ( 'ISO-8859-1' , 'UTF-8' , $a)

Now, the error is gone. However the string which reached to Postgres is not correct.

The code I used is below. And my test string is aöaçaşaıağaüaÖaÇaŞaİaĞaÜ

$a is the string which comes from Pervasive

echo $a; 

gives aöaçaşaıağaüaÖaÇaŞaİaĞaÜ

echo iconv ( 'ISO-8859-1' , 'UTF-8' , $a)

gives a┬öa┬ça┬şa┬ıa┬ğa┬üa┬Öa┬Ça┬Şa┬İa┬Ğa┬Ü

<?php
//var_dump(iconv_get_encoding('all'));

$conn = pg_connect("host=localhost port=5432 dbname=xxx user=xxx password=".$argv[1]);

$result = pg_prepare($conn, "my_query", 'SELECT * FROM func_my_deneme($1)');

$connect_string = "DRIVER={Pervasive ODBC Client Interface}; SERVERNAME=localhost; SERVERDSN=xxx;";
$pervasiveconn = odbc_connect($connect_string, 'xxx', 'xxx');

$pervasive_result = odbc_exec($pervasiveconn ,"SELECT something");

while(odbc_fetch_row($pervasive_result)){
  $a=odbc_result($pervasive_result,1);

  echo $a;

  $result = pg_execute($conn, "my_query", array(iconv ( 'ISO-8859-1' , 'UTF-8' , $a)));
}
?>
  • 写回答

1条回答 默认 最新

  • dongxiangshen7916 2014-07-03 07:54
    关注

    You only seem to be looking at one of the two encoding exchanges here.

    You have:

    (pervasive's native encoding) -> (PHP string)
    

    and

    (PHP string) -> (PostgreSQL)
    

    Of these, you're only explicitly handling the second. You're assuming that the data Pervasive's ODBC driver returns is in PHP's default encoding, which on your system is iso-8859-1.

    Your tests suggest that assumption may be correct, but simply echo'ing the string isn't a good way to tell, because that introduces another encoding step:

    (PHP string) -> (whatever decodes it for viewing)
    

    be that a web browser, terminal or whatever. If the viewer expects some encoding that happens to be the same as Pervasive is using it will decode the output corectly.

    Try:

    echo $a;
    echo "aöaçaşaıağaüaÖaÇaŞaİaĞaÜ";
    

    and make sure the viewer shows the same value for both. Make sure you edit your source file with the encoding set to iso-8859-1, not some other encoding, so that the literal bytes of the string you paste are correct.

    At that point you should get an error if your editor is set correctly because not all those characters are legal in iso-8859-1. The first invalid one is ş.

    So clearly what's coming from Pervasive can't be iso-8859-1. To really print a latin-1 string, you can echo the escaped bytes. For example, this string:

    aöaçaaaüaÖaÇaaaaÜ
    

    in which all chars are legal iso-8859-1, is printed in iso-8859-1 encoding with:

    echo "a\xf6a\xe7aaa\xfca\xd6a\xc7aaaa\xdc"
    

    Here, hex escapes are used to specify non-7-bit characters to unambiguously ensure that the encoding of the byte sequence is what you think without any confusion about text editors etc.

    Betcha that doesn't print right when you view it, because whatever's reading the input isn't decoding it as iso-8859-1.


    What you should be doing is looking at the bytes of the string you get from Pervasive to see what it really is. Then determining its encoding and decoding it into utf-8, which you can then send to PostgreSQL over a client_encoding = utf-8 connection. @deceze suggested bin2hex for this (I don't speak PHP, so didn't know what to suggest). So show the output of:

    echo bin2hex($a) . "
    ";
    

    Or - even better - make sure you determine from the configuration / documentation what the encoding of the data coming from Pervasive is, rather than guessing. Or just force it.

    A quick look at the Pervasive documentation showed that the ODBC Driver has an encoding parameter that takes the code page ID for the desired encoding. So try:

    $connect_string = "DRIVER={Pervasive ODBC Client Interface}; SERVERNAME=localhost; SERVERDSN=xxx; encoding=65001";
    

    (Microsoft, at least, defines 65001 as the codepage for utf-8 per this doc).

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥20 完全没有学习过GAN,看了CSDN的一篇文章,里面有代码但是完全不知道如何操作
  • ¥15 使用ue5插件narrative时如何切换关卡也保存叙事任务记录
  • ¥20 软件测试决策法疑问求解答
  • ¥15 win11 23H2删除推荐的项目,支持注册表等
  • ¥15 matlab 用yalmip搭建模型,cplex求解,线性化处理的方法
  • ¥15 qt6.6.3 基于百度云的语音识别 不会改
  • ¥15 关于#目标检测#的问题:大概就是类似后台自动检测某下架商品的库存,在他监测到该商品上架并且可以购买的瞬间点击立即购买下单
  • ¥15 神经网络怎么把隐含层变量融合到损失函数中?
  • ¥15 lingo18勾选global solver求解使用的算法
  • ¥15 全部备份安卓app数据包括密码,可以复制到另一手机上运行