dougai2427 2012-11-08 22:39
浏览 77
已采纳

UTF-8,数字和正则表达式

This is what I've found in the Kohana3 validator rules:

public static function digit($str, $utf8 = FALSE)
{
    if ($utf8 === TRUE)
    {
        return (bool) preg_match('/^\pN++$/uD', $str);
    }
    else
    {
        return (is_int($str) AND $str >= 0) OR ctype_digit($str);
    }
}

Can someone give an example when passing $utf8 parameter as true and false can give different results (to be precise - false positives for $utf8 == false)?

From what I remember - digits are ascii-safe characters and none of utf-8 characters may be confused with them.

PS: even more detailed - is it possible to fool this check and pass something that in UTF-8 would look not like a number, but would pass the check with $utf-8 == false

  • 写回答

3条回答 默认 最新

  • dtp791357 2012-11-08 23:22
    关注

    Just gave your second question part a bit more alcohol, and my conclusion is that you can't hide an ASCII digit in a UTF-8 sequence. Digits must be 0x30..0x39 or in the bitrange 00110000..00110110..00111001.

    UTF-8 encodings include prefixes such as

     11110xxx  10xxxxxx  10xxxxxx
    

    And therefore a digit ASCII representation can't match anywhere:

     00110000 
     ▲▲        00110000  ▼
               ▲         00110000
    

    So it's impossible that it would match in Latin-1/ASCII mode, but also have \pN satisfied in /u mode. Ignoring invalid encodings of course.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥50 三种调度算法报错 有实例
  • ¥15 关于#python#的问题,请各位专家解答!
  • ¥200 询问:python实现大地主题正反算的程序设计,有偿
  • ¥15 smptlib使用465端口发送邮件失败
  • ¥200 总是报错,能帮助用python实现程序实现高斯正反算吗?有偿
  • ¥15 对于squad数据集的基于bert模型的微调
  • ¥15 为什么我运行这个网络会出现以下报错?CRNN神经网络
  • ¥20 steam下载游戏占用内存
  • ¥15 CST保存项目时失败
  • ¥20 java在应用程序里获取不到扬声器设备