在PHP中将utf-8字符列入白名单的最有效方法是什么？

My goal is to protect my web site from attacks by creating a strict whitelist of allowed characters for any and all POST data recieved from the client side.

This is a piece of cake when staying within ASCII characters. Something like:

if(preg_match('/[^aA-zZ0-9]/', $stringToTest))
{
   // Battle stations!!
}

However, I need to be able to allow any and all utf-8 characters, especially asian character sets like Japanese, Chinese, and Korean. But I don't want to exclude anybody with wacky characters, like Arabic or Russian, or whatever. One world, one love! ;)

How can I allow people to input the characters of their native language while excluding the nasties used in evil scripts, like *, ?, angle brackets, and so on?

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

4条回答默认最新

dongye1934 2011-02-22 05:01

关注

\w will give you word characters (letters, digits, and underscores), which is probably what you're after \s for whitespace.

e.g.

if(preg_match('/[\w\s]/', $stringToTest))
{
   // Battle stations!!
}

regular-expressions.info is an excellent reference for this stuff - here and here are a couple of relevant pages :)

edit: some more clarification needed, sorry!

here's what I usually use for CJK:

function get_CJK_ranges() {

    return array(
                "[\x{2E80}-\x{2EFF}]",      # CJK Radicals Supplement
                "[\x{2F00}-\x{2FDF}]",      # Kangxi Radicals
                "[\x{2FF0}-\x{2FFF}]",      # Ideographic Description Characters
                "[\x{3000}-\x{303F}]",      # CJK Symbols and Punctuation
                "[\x{3040}-\x{309F}]",      # Hiragana
                "[\x{30A0}-\x{30FF}]",      # Katakana
                "[\x{3100}-\x{312F}]",      # Bopomofo
                "[\x{3130}-\x{318F}]",      # Hangul Compatibility Jamo
                "[\x{3190}-\x{319F}]",      # Kanbun
                "[\x{31A0}-\x{31BF}]",      # Bopomofo Extended
                "[\x{31F0}-\x{31FF}]",      # Katakana Phonetic Extensions
                "[\x{3200}-\x{32FF}]",      # Enclosed CJK Letters and Months
                "[\x{3300}-\x{33FF}]",      # CJK Compatibility
                "[\x{3400}-\x{4DBF}]",      # CJK Unified Ideographs Extension A
                "[\x{4DC0}-\x{4DFF}]",      # Yijing Hexagram Symbols
                "[\x{4E00}-\x{9FFF}]",      # CJK Unified Ideographs
                "[\x{A000}-\x{A48F}]",      # Yi Syllables
                "[\x{A490}-\x{A4CF}]",      # Yi Radicals
                "[\x{AC00}-\x{D7AF}]",      # Hangul Syllables
                "[\x{F900}-\x{FAFF}]",      # CJK Compatibility Ideographs
                "[\x{FE30}-\x{FE4F}]",      # CJK Compatibility Forms
                "[\x{1D300}-\x{1D35F}]",    # Tai Xuan Jing Symbols
                "[\x{20000}-\x{2A6DF}]",    # CJK Unified Ideographs Extension B
                "[\x{2F800}-\x{2FA1F}]"     # CJK Compatibility Ideographs Supplement
    );

}

function contains_CJK($string) {
    $regex = '/'.implode('|',get_CJK_ranges()).'/u';
    return preg_match($regex,$string);
}

To get everything that's could be a problem for escaping and other black-hat stuff, use:

/[^\p{Punctuation}]/ ( == /[^\p{P}]/ )

/[^\32-\151]/ ( == /[^!-~]/ )

another good link

本回答被题主选为最佳回答 , 对您是否有帮助呢?

查看更多回答(3条)

报告相同问题？

关注问题

如果我在PHP中将UTF-8编码的字符串与ASCII字符串连接，那么结果字符串的编码是什么？ php
2019-01-29 17:25

回答 2 已采纳 It would depend firstly on whether you mean strict ASCII, which only includes 128 characters. Ever
在Go中将带有UTF-8字节字符串的命令行输出转换为Unicode代码点
2019-04-10 18:21

回答 1 已采纳 You can use the strconv package to parse the string literal containing the escape sequences. The
如何在php [codeigniter]中将utf-8设置为csv文件 php
2015-11-08 09:28

回答 3 已采纳 Change your force_download from force_download($filename, $data); To this: force_download($fil
【burpsuite安全练兵场-服务端8】文件上传漏洞-7个实验（全）
2023-01-10 19:55

黑色地带（崛起）的博客【BP靶场portswigger-服务端8-文件上传漏洞】7个实验-万文详细步骤
C#将String默认的字符编码改为UTF-8 asp.net c#
2020-06-16 20:38

回答 1 已采纳 ``` public static string utf8_gb2312(string text) { //声明字符集 System.Text
qt 在utf-8的编码环境中将unsigned char*转成Ansi编码的char* c++ c语言 qt
2021-06-21 16:52

回答 2 已采纳使用QTextCodec转码https://doc.qt.io/qt-5/qtextcodec.html。或者QString自带的一些转码。这个ANSI不能算是一个确切的编码格式，在window，中文
如何在Mac中将MacCyrillic（x-mac-cyrillic，CP10007）转换为UTF-8？ php
2014-06-30 07:27

回答 1 已采纳 The list of supported encodings can be found in the libiconv library website (that's the underlyin
Godot官网新闻翻译 - 2016年
2022-03-13 15:57

巽星石的博客 2016 Godot 2.0现在是测试版！...如果您最喜欢的错误尚未修复，请将其发布到GitHub或（如果已经存在）更新它，以表达您希望它得到修复以实现2.0稳定的愿望。在下载部分试一试吧！戈多 2.0 RC1 发布！作者：胡安
在PHP中将对象转换为数组的最快方法是什么？ php
2015-04-10 19:05

回答 1 已采纳 The speediness of the json_encode+json_decode approach comes from the fact that both functions hav
使用PHP什么是从数据库中将索引字符串转换为其名称值的最有效方法？ php
2015-02-03 01:15

回答 2 已采纳 I went with : if ( $item->tags ) { $eachTag = explode(",", $item-
PHP - 在list（）参数中将字符串转换为int？ php
2016-06-24 18:49

回答 2 已采纳 By this reference:- how to convert array values from string to int? you can do it like below:-
portswigger 目录遍历&文件上传
2022-09-25 14:26

葫芦娃42的博客在这种情况下，即使您需要的文件扩展名被列入黑名单，您也可以欺骗服务器将任意自定义文件扩展名映射到可执行的 MIME 类型。一种方法是上传更大的文件。如果它以块的形式进行处理，您可以通过在开始时创建一个带有...
【JVM】JVM基础
2022-03-27 19:23

_青昔_的博客准确的说任何能在jvm平台上执行的字节码格式都是一样的。所以应该统称为：jvm字节码。不同的编译器，可以编译出相同的字节码文件，字节码文件也可以在不同的JVM上运行。 Java虚拟机与Java语言并没有必然的联系，它...
渗透测试之安全手册（干货）
2022-10-31 09:49

保持微笑-泽的博客身份标志风险等级：中漏洞描述：用户帐号（包括管理员及普通用户）应具有唯一性，保证应用系统中不存在重复用户帐号。测试步骤：修复方案：在注册时不仅对ID进行生成，也要对用户名做判断，防止相同用户名的账户重复...
常见文件上传漏洞利用
2019-12-14 20:47

行者_Seven的博客文件上传漏洞利用一、常见文件上传绕过方法1.javascript验证突破2.大小写突破3.服务器文件扩展名检测（不符合服务器端规定规则则不让上传）4.特殊后缀名绕过5.MIME类型6.文件内容检测7.图片马8.使用分布式配置9.文件...
没有解决我的问题, 去提问

悬赏问题

¥15 乌班图ip地址配置及远程SSH
¥15 怎么让点阵屏显示静态爱心，用keiluVision5写出让点阵屏显示静态爱心的代码，越快越好
¥15 PSPICE制作一个加法器
¥15 javaweb项目无法正常跳转
¥15 VMBox虚拟机无法访问
¥15 skd显示找不到头文件
¥15 机器视觉中图片中长度与真实长度的关系
¥15 fastreport table 怎么只让每页的最下面和最顶部有横线
¥15 R语言卸载之后无法重装，显示电脑存在下载某些较大二进制文件行为，怎么办
¥15 java 的protected权限，问题在注释里

码龄粉丝数原力等级 --

在PHP中将utf-8字符列入白名单的最有效方法是什么？

4条回答默认最新

码龄粉丝数原力等级 --

悬赏问题

在PHP中将utf-8字符列入白名单的最有效方法是什么？

4条回答 默认 最新

悬赏问题

4条回答默认最新