douan2907 2009-10-28 11:10
浏览 22
已采纳

将两个字符串转换为相同的字节长度

I have 2 strings in my PHP code, 1 is a parameter to my method and 1 is a string from an ini file. The problem is that they are not equal, although they have the same content, probably due to encoding issues. When using var_dump, it is reported that the first string's lenght is 23 and the second string's length is 47 (see the end of my question for the reason behind this)

How can i make sure they are both encoded the same way and have the same length in the end so comparison won't fail? Preferably, i would like them to be utf8 encoded.

For reference, this is an excerpt from the code:

static function getString($keyword,$file) {

$lang_handle = parse_ini_file($file, true);

var_dump($keyword);
    foreach ($lang_handle as $key => $value) {
        var_dump($key);
        if ($key == $keyword) {
            foreach ($value as $subkey => $subvalue) {
                var_dump("\t" . $subkey . " => " . $subvalue);
            }
        }
    }
}

with the following ini:

[clientcockpit/login.php]
header = "Kunden Login"
username = "Benutzername"
password = "Passwort"
forgot = "Passwort vergessen"
login = "Login"

When calling the method with getString("clientcockpit/login.php", "inifile.ini") the output is:

string 'clientcockpit/login.php' (length=23)
string '�c�l�i�e�n�t�c�o�c�k�p�i�t�/�l�o�g�i�n�.�p�h�p�' (length=47)
  • 写回答

2条回答 默认 最新

  • dongxun6458 2009-10-28 11:24
    关注

    Your INI file seems to be in UTF16 encoding or similar, using two bytes to represent a single character. I guess that the strange characters in your string are actually NULL bytes (\0).

    PHP's Unicode support is quite poor and I guess that parse_ini_file() does not support multibyte encodings properly. It will treat the file as if it was encoded using a "ASCII-compatible" single-byte encoding, just looking for special characters [ and ] to detect sections. As a result, the section keys will be corrupted: One byte actually belonging to [ or ] will be part of the section key:

    UTF-16:    [c]    (3 characters, 6 bytes)
    
    For UTF-16BE (big endian):
    
      Bytes:    00 5B    00 63    00 5D    (6 bytes)
      ASCII:    \0  [    \0  c    \0  ]    (6 characters)
    
    For UTF-16LE (little endian):
    
      Bytes:    5B 00    63 00    5D 00    (6 bytes)
      ASCII:    [  \0    c  \0    ]  \0    (6 characters)
    

    Assuming ASCII, instead of reading c, parse_ini_file() will read \0c\0 if the source file encoding is UTF-16.

    If you can control the format of your INI file, make sure to save it in UTF8 or ISO-8859-1 encoding, using your favorite text editor.

    Otherwise you will have to read in the file contents using file_get_contents(), do the encoding conversion (eg. using iconv()) and pass the result to parse_ini_string(). The drawback here is that you will have to detect or hardcode the original file encoding.

    If the mb multibyte extension is available on your PHP installation, you can use mb_detect_encoding() and mb_convert_encoding() to do the conversion dynamically.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 用windows做服务的同志有吗
  • ¥60 求一个简单的网页(标签-安全|关键词-上传)
  • ¥35 lstm时间序列共享单车预测,loss值优化,参数优化算法
  • ¥15 基于卷积神经网络的声纹识别
  • ¥15 Python中的request,如何使用ssr节点,通过代理requests网页。本人在泰国,需要用大陆ip才能玩网页游戏,合法合规。
  • ¥100 为什么这个恒流源电路不能恒流?
  • ¥15 有偿求跨组件数据流路径图
  • ¥15 写一个方法checkPerson,入参实体类Person,出参布尔值
  • ¥15 我想咨询一下路面纹理三维点云数据处理的一些问题,上传的坐标文件里是怎么对无序点进行编号的,以及xy坐标在处理的时候是进行整体模型分片处理的吗
  • ¥15 一直显示正在等待HID—ISP