duaiwo9093 2014-06-21 21:14
浏览 147

preg_match()和多字节字符

I'm trying to handle Polish characters with preg_match, but something is pretty wrong.

These are my attempts:

  1. Without the u modifier:

    preg_match("@^[0-9A-ZĄąĆćĘꣳÓ󯿏źŃńŚś\-\.\, ]{5,35}$@i", $valuesId)
    
  2. With the u modifier:

    preg_match("@^[0-9A-ZĄąĆćĘꣳÓ󯿏źŃńŚś\-\.\, ]{5,35}$@iu", $valuesId)
    

But words like Żółkiewski, Zielona Góra or Równina cannot not passed.

Does anybody know how to handle it correctly without changing server settings?

  • 写回答

2条回答 默认 最新

  • dongya9904 2014-06-21 21:42
    关注

    Are these characters really multi-byte?

    As shown by this online demo, the following code returns 1 (the TRUE value) three times:

    $regex = "@^[0-9A-ZĄąĆćĘꣳÓ󯿏źŃńŚś., -]{5,35}$@i";
    echo preg_match($regex,"Żółkiewski")."
    ";
    echo preg_match($regex,"Zielona Góra")."
    ";
    echo preg_match($regex,"Równina")."
    ";
    

    Therefore the problem is not with the regex, but with a mismatch between the encoding of the script where the regex lives and of the input fed to the regex. It may well be, for instance, that your script is using one of the Windows or ISO Eastern European encodings... In which case they may not be multi-byte at all. Many IDEs and editors are able to convert a text file's encoding.

    The best choice for future-proofing is to make sure every component of your system talks utf-8:

    • The script's encoding
    • The header sent by the script
    • The meta tag
    • The connection to the database
    • The data in the database

    And so on. Addressing how to achieve all of those is a topic for a book chapter and beyond the scope of this question.

    评论

报告相同问题?

悬赏问题

  • ¥20 测距传感器数据手册i2c
  • ¥15 RPA正常跑,cmd输入cookies跑不出来
  • ¥15 求帮我调试一下freefem代码
  • ¥15 matlab代码解决,怎么运行
  • ¥15 R语言Rstudio突然无法启动
  • ¥15 关于#matlab#的问题:提取2个图像的变量作为另外一个图像像元的移动量,计算新的位置创建新的图像并提取第二个图像的变量到新的图像
  • ¥15 改算法,照着压缩包里边,参考其他代码封装的格式 写到main函数里
  • ¥15 用windows做服务的同志有吗
  • ¥60 求一个简单的网页(标签-安全|关键词-上传)
  • ¥35 lstm时间序列共享单车预测,loss值优化,参数优化算法