dqxsuig64994 2013-07-31 17:22 采纳率: 0%
浏览 85
已采纳

使用UTF-8字符串的strpos是否安全?

I have a bunch of strings with different charsets. The $charset variable contains the charset of the current string.

$content = iconv($charset, 'UTF-8', $content);

With this done, is it safe to use strpos, strlen, substr etcetera and not their multibyte equivalent? I'm asking this because I use preg_match a lot as well. So if I use PREG_OFFSET_CAPTURE to get the position of a word in the string I can't use that value with mb_substr to remove everything before the word.

  • 写回答

2条回答 默认 最新

  • duansaxf095988 2013-07-31 17:50
    关注

    That entirely depends on what you want to do. The core strlen and similar functions work on bytes. Every number they accept and return is a byte count or byte offset. The mb_* functions work encoding-aware on characters. All numbers they accept and return are character counts or offsets.

    If you have a safe way of getting a byte offset in a string ("safe" meaning the offset is not in the middle of a multi-byte character) and then, for example, crop everything before that offset using substr, that'll work just fine. For instance:

    $str     = '漢字';
    $offset  = strpos($str, '字');
    $cropped = substr($str, $offset);
    

    Works fine.

    However, this won't work:

    $cropped = substr($str, $offset, 1);
    

    You can't safely cut out a single byte without running the risk of cutting into a multi-byte character.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥30 求一段fortran代码用IVF编译运行的结果
  • ¥15 深度学习根据CNN网络模型,搭建BP模型并训练MNIST数据集
  • ¥15 lammps拉伸应力应变曲线分析
  • ¥15 C++ 头文件/宏冲突问题解决
  • ¥15 用comsol模拟大气湍流通过底部加热(温度不同)的腔体
  • ¥50 安卓adb backup备份子用户应用数据失败
  • ¥20 有人能用聚类分析帮我分析一下文本内容嘛
  • ¥15 请问Lammps做复合材料拉伸模拟,应力应变曲线问题
  • ¥30 python代码,帮调试,帮帮忙吧
  • ¥15 #MATLAB仿真#车辆换道路径规划