dtn43447 2013-12-29 12:22
浏览 164
已采纳

正则表达式匹配以unicode字符开头的单词会返回意外结果

I want to check existence of the word 'açilek' in the context. Running this:

$word = 'açilek';
$article='elma  and  açilek word';
$mat=preg_match('/\b'. $word .'\b/', $article);
var_dump($mat);

Succeeds. This is expected. However, to match the word 'çilek', the code returns False which is not expected:

$word = 'çilek';
$article='elma  and  çilek word';
$mat=preg_match('/\b'. $word .'\b/', $article);
var_dump($mat); //returns false !!!!

Additionally, it will match this word if it is a part of a word, also unexpected:

$word = 'çilek';
$article='elma  and  açilek word';
$mat=preg_match('/\b'. $word .'\b/', $article);
var_dump($mat); //returns true !!!!

Why am I seeing this behavior?

  • 写回答

2条回答 默认 最新

  • dtoq41429 2013-12-29 12:26
    关注

    You need to use the /u modifier to make the regex (especially \b) Unicode-aware:

    $mat=preg_match('/\b'. $word .'\b/u', $article);
    

    Otherwise, \b only considers positions between ASCII alphanumerics and ASCII non-alnums as word boundaries, therefore matching between a and çilek but not between   and çilek.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 素材场景中光线烘焙后灯光失效
  • ¥15 请教一下各位,为什么我这个没有实现模拟点击
  • ¥15 执行 virtuoso 命令后,界面没有,cadence 启动不起来
  • ¥50 comfyui下连接animatediff节点生成视频质量非常差的原因
  • ¥20 有关区间dp的问题求解
  • ¥15 多电路系统共用电源的串扰问题
  • ¥15 slam rangenet++配置
  • ¥15 有没有研究水声通信方面的帮我改俩matlab代码
  • ¥15 ubuntu子系统密码忘记
  • ¥15 保护模式-系统加载-段寄存器