dongyonglie5132 2011-07-17 11:36
浏览 100
已采纳

如何在PHP中检测格式错误的utf-8字符串?

iconv function sometimes gives me an error:

Notice:
iconv() [function.iconv]:
Detected an incomplete multibyte character in input string in [...]

Is there a way to detect that there are illegal characters in utf-8 string before putting data to inconv ?

  • 写回答

4条回答 默认 最新

  • dongshi1880 2011-07-17 11:41
    关注

    First, note that it is not possible to detect whether text belongs to a specific undesired encoding. You can only check whether a string is valid in a given encoding.

    You can make use of the UTF-8 validity check that is available in preg_match [PHP Manual] since PHP 4.3.5. It will return 0 (with no additional information) if an invalid string is given:

    $isUTF8 = preg_match('//u', $string);
    

    Another possibility is mb_check_encoding [PHP Manual]:

    $validUTF8 = mb_check_encoding($string, 'UTF-8');
    

    Another function you can use is mb_detect_encoding [PHP Manual]:

    $validUTF8 = ! (false === mb_detect_encoding($string, 'UTF-8', true));
    

    It's important to set the strict parameter to true.

    Additionally, iconv [PHP Manual] allows you to change/drop invalid sequences on the fly. (However, if iconv encounters such a sequence, it generates a notification; this behavior cannot be changed.)

    echo 'TRANSLIT : ', iconv("UTF-8", "ISO-8859-1//TRANSLIT", $string), PHP_EOL;
    echo 'IGNORE   : ', iconv("UTF-8", "ISO-8859-1//IGNORE", $string), PHP_EOL;
    

    You can use @ and check the length of the return string:

    strlen($string) === strlen(@iconv('UTF-8', 'UTF-8//IGNORE', $string));
    

    Check the examples on the iconv manual page as well.

    You have not shared the source code where the notice is resulting from. You should add it if you want a more concrete suggestion.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(3条)

报告相同问题?

悬赏问题

  • ¥100 有人会搭建GPT-J-6B框架吗?有偿
  • ¥15 求差集那个函数有问题,有无佬可以解决
  • ¥15 【提问】基于Invest的水源涵养
  • ¥20 微信网友居然可以通过vx号找到我绑的手机号
  • ¥15 寻一个支付宝扫码远程授权登录的软件助手app
  • ¥15 解riccati方程组
  • ¥15 display:none;样式在嵌套结构中的已设置了display样式的元素上不起作用?
  • ¥15 使用rabbitMQ 消息队列作为url源进行多线程爬取时,总有几个url没有处理的问题。
  • ¥15 Ubuntu在安装序列比对软件STAR时出现报错如何解决
  • ¥50 树莓派安卓APK系统签名