2012-06-23 15:34
浏览 39

带有#U的php preg_replace似乎将带有特殊字符的字符串转换为空字符串

I've been investigating this problem for several hours now and narrowed it down to these few lines of code. I know the code isn't perfect, but it's what I've got to work with from the developer. The script is supposed to filter out potentially malicious code. But the problem is that the string seems to become empty whenever someone uses a special character, such as á, ñ, ö, etc.

For example, if someone writes "viva españa", the string goes empty.

If someone writes "viva espana" (without the ñ), it's all good.

The same goes for other special characters. What could be causing this? I have virtually zero knowledge about regular expressions, so it's a bit like garbage to me, but what I do know is that when I comment out these lines, the script works both with and without the special characters in the string and the moment I uncomment them, it only works without special characters in the string.

Any ideas?

These are the code lines:

  $string = preg_replace('#(&\#*\w+)[\x00-\x20]+;#u', "$1;", $string);
  $string = preg_replace('#(&\#x*)([0-9A-F]+);*#iu', "$1$2;", $string);
  $string = preg_replace('#(<[^>]+[\x00-\x20\"\'\/])(on|xmlns)[^>]*>#iUu', "$1>", $string);

  $string = preg_replace('#([a-z]*)[\x00-\x20\/]*=[\x00-\x20\/]*([\`\'\"]*)[\x00-\x20\/]*j[\x00-\x20]*a[\x00-\x20]*v[\x00-\x20]*a[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iUu', '$1=$2nojavascript...', $string);
  $string = preg_replace('#([a-z]*)[\x00-\x20\/]*=[\x00-\x20\/]*([\`\'\"]*)[\x00-\x20\/]*v[\x00-\x20]*b[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iUu', '$1=$2novbscript...', $string);
  $string = preg_replace('#([a-z]*)[\x00-\x20\/]*=[\x00-\x20\/]*([\`\'\"]*)[\x00-\x20\/]*-moz-binding[\x00-\x20]*:#Uu', '$1=$2nomozbinding...', $string);
  $string = preg_replace('#([a-z]*)[\x00-\x20\/]*=[\x00-\x20\/]*([\`\'\"]*)[\x00-\x20\/]*data[\x00-\x20]*:#Uu', '$1=$2nodata...', $string);

  $string = preg_replace('#(<[^>]+[\x00-\x20\"\'\/])style[^>]*>#iUu', "$1>", $string);
  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 邀请回答

1条回答 默认 最新

  • dongxia2068 2012-06-23 15:37

    I would suggest not using u. That flag specifies that the string is in Unicode, but you're only working with strings in the ASCII range.

    点赞 打赏 评论

相关推荐 更多相似问题