dsjswclzh40259075 2013-06-26 07:52
浏览 43
已采纳

preg_match与UTF8

Let's say I have the following:

$str1 = "via Tokyo";
$str2 = "via 東京";

I want to match any non-whitespace characters after the "via ". Normally I'd use the following:

preg_match("/via\s(\S+)/", $str2, $match);

to obtain the matching characters. I assumed this wouldn't work with the above due to preg_match not understanding utf8, however it works perfectly in this case.

Is this working correctly because preg_match is simply looking for bytes that aren't whitespace, and if so, am I safe to use this for any UTF8 characters?

PS I'm aware that I should really be using the mb_ereg functions for this (or avoiding PHP altogether) but I'm looking for a better understanding of why this works. Thanks!

  • 写回答

2条回答 默认 最新

  • dqpdb82600 2013-06-26 07:57
    关注

    Yes, UTF-8 uses multi-byte sequences for the special Unicode characters, and it guarantees that they are different from the ASCII ones by having a high bit (undermore). So searching for slash, backslash or space will never have a false positive in a multi-byte sequence.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)
编辑
预览

报告相同问题?

手机看
程序员都在用的中文IT技术交流社区

程序员都在用的中文IT技术交流社区

专业的中文 IT 技术社区,与千万技术人共成长

专业的中文 IT 技术社区,与千万技术人共成长

关注【CSDN】视频号,行业资讯、技术分享精彩不断,直播好礼送不停!

关注【CSDN】视频号,行业资讯、技术分享精彩不断,直播好礼送不停!

客服 返回
顶部