duandie0884
2010-03-30 07:43 阅读 40
已采纳

PHP序列化功能兼容UTF-8吗?

I have a site I want to migrate from ISO to UTF-8.

I have a record in database indexed by the following primary key :

s:22:"Informations générales";

The problem is, now (with UTF-8), when I serialize the string, I get :

s:24:"Informations générales";

(notice the size of the string is now the number of bytes, not string length)

So this is not compatible with non-utf8 previous records !

Did I do something wrong ? How could I fix this ?

Thanks

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享

5条回答 默认 最新

  • 已采纳
    dos71253 dos71253 2010-03-30 07:51

    The behaviour is completely correct. Two strings with different encodings will generate different byte streams, thus different serialization strings.

    点赞 评论 复制链接分享
  • dongyuan4790 dongyuan4790 2010-03-30 07:49

    PHP 4 and 5 do not have built-in Unicode support; I believe PHP 6 is starting to add more Unicode support although I'm not sure how complete that is.

    点赞 评论 复制链接分享
  • dtttlua7165 dtttlua7165 2010-03-30 07:50

    You did nothing wrong. PHP prior to v6 just isn't Unicode aware, and as such doesn't support it, if you don't beat it to be (i.e., via the mbstring extension or other means).

    We here wrote our own wrapper around serialize() to remedy this. You could, too, move to other serialization techniques, like JSON (with json_encode() and json_decode() in PHP since 5.2.0).

    点赞 评论 复制链接分享
  • dongmei6426 dongmei6426 2011-06-25 23:32

    Dump the database in latin1.

    In the command line:

    sed  -e 's/latin1/utf8/g' -i ./DBNAME.sql
    

    Import the file converted to a new database in UTF-8.

    Use a php script to update each field. Make a query, loop through each field and update the serialized string using this:

    $str = preg_replace('!s:(\d+):"(.*?)";!se', "'s:'.strlen('$2').':\"$2\";'", $str);
    

    After that, I was able to use unserialize() and everything working with UTF-8.

    点赞 评论 复制链接分享
  • doutu7123 doutu7123 2013-10-29 13:52

    To unserialize an utf-8 encoded serialized array:

    $array = @unserialize($arrayFromDatabase);
    if ($array === false) {
      $array =  @unserialize(utf8_decode($arrayFromDatabase)); //decode first
      $array = array_map('utf8_encode', $array ); // encode the array again
    }
    
    点赞 评论 复制链接分享

相关推荐