doupin1073 2014-05-20 15:22
浏览 50
已采纳

PHP正则表达式对西里尔字符集不区分大小写

I am using preg_replace and preg_match with PHP, working in this charset: Cyrillic Windows 1251. I am trying to match a word using the case-insensitive modifier.

I made these tests :

$pattern = '/myCyrillicWord1|myCyrillicWord2/i';
$subject = 'Am I able to find MYCyrILlicWord1?';
$res = preg_replace($pattern, 'matched', $subject);

On UTF-8 :

With the utf-8 modifier in the pattern :

$pattern = '/myCyrillicWord1|myCyrillicWord2/iu';
$output = 'Am I able to find matched or not';

Without :

$pattern = '/myCyrillicWord1|myCyrillicWord2/i';
$output = 'Am I able to find MYCyrILlicWord1 or not';

On Windows 1251 :

$pattern = '/myCyrillicWord1|myCyrillicWord2/i';
$output = 'Am I able to find MYCyrILlicWord1 or not';

The regex is functionnal on utf-8 but not on Windows 1251. Please notice that I had tested with cyrillics characters like 'х' and 'Х' (which look like latin letters 'x' and 'X').

My question is to know if that behavior is normal ?

How can I match my cyrillics words in Windows 1251 charset with the case-insensitive modifier ?

Many thanks.

  • 写回答

1条回答 默认 最新

  • donglong7338 2014-05-20 17:43
    关注

    I don't think PCRE supports charsets, so your options are basically

    • convert everything to utf8, process and then convert back, or
    • use manually crafted regexes for case-insensitivity, like /[Дд][Ыы][Кк]/ to match Дык, дыК etc
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 求螺旋焊缝的图像处理
  • ¥15 blast算法(相关搜索:数据库)
  • ¥15 请问有人会紧聚焦相关的matlab知识嘛?
  • ¥15 网络通信安全解决方案
  • ¥50 yalmip+Gurobi
  • ¥20 win10修改放大文本以及缩放与布局后蓝屏无法正常进入桌面
  • ¥15 itunes恢复数据最后一步发生错误
  • ¥15 关于#windows#的问题:2024年5月15日的win11更新后资源管理器没有地址栏了顶部的地址栏和文件搜索都消失了
  • ¥100 H5网页如何调用微信扫一扫功能?
  • ¥15 讲解电路图,付费求解