preg_match与UTF8

Let's say I have the following:

$str1 = "via Tokyo";
$str2 = "via 東京";

I want to match any non-whitespace characters after the "via ". Normally I'd use the following:

preg_match("/via\s(\S+)/", $str2, $match);

to obtain the matching characters. I assumed this wouldn't work with the above due to preg_match not understanding utf8, however it works perfectly in this case.

Is this working correctly because preg_match is simply looking for bytes that aren't whitespace, and if so, am I safe to use this for any UTF8 characters?

PS I'm aware that I should really be using the mb_ereg functions for this (or avoiding PHP altogether) but I'm looking for a better understanding of why this works. Thanks!

写回答
好问题 0 提建议
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dqpdb82600 2013-06-26 07:57
关注
Yes, UTF-8 uses multi-byte sequences for the special Unicode characters, and it guarantees that they are different from the ASCII ones by having a high bit (undermore). So searching for slash, backslash or space will never have a false positive in a multi-byte sequence.

本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报
编辑

预览
轻敲空格完成输入
显示为

卡片

标题

链接
评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(1条)

编辑

预览

报告相同问题？

关注问题

utf-8的preg_match规则 php
2013-01-24 14:19

回答 3 已采纳 This will give "0", cos مسیح ارسطوئی is not containing only 3-10 chars; $x = preg_match('~^([\pL]
在PHP中使用preg_match的UTF正则表达式 php
2017-09-24 00:37

回答 2 已采纳 The problem occurs because your csv file starts with a UTF-8 BOM. If you remove this, the regex wo
preg_match_all（）和UTF8字符问题 - 最简单的方法 php
2011-08-03 08:20

回答 2 已采纳 Try... preg_match_all('#[a-z\x{0105}\x{015B}\x{0107}\x{0142}\x{00F3}\x{017C}\x{017A}\x{0144}]{3,}
PHP中preg_match正则匹配中的/u、/i、/s含义
2020-12-17 06:31

$utf8_length = strlen(preg_replace('/^[\x00-\x7F]*$/u', '', $str)); ``` 总的来说，了解并熟练使用这些修饰符对于编写高效的PHP正则表达式非常重要，它们可以根据不同的需求调整匹配规则，以适应各种复杂的文本...
PHP preg_match不允许æøå php
2017-01-14 14:02

回答 1 已采纳 If your PHP file is already UTF-8 encoded, you should tell your database, that you need UTF-8. Ins
如何在preg_match中允许utf-8字符集？ php
2010-05-28 22:56

回答 1 已采纳 Try the /u pattern modifier to let the PCRE engine know you're using UTF-8 patterns: This modi
preg_replace + UTF-8在一台服务器上不起作用，但在另一台服务器上起作用 php
2012-04-08 07:14

回答 2 已采纳 There may be difference between PCRE versions which PHP use. PHP and PCRE versions: http://php.ne
php中pregmatch,php小经验:解析preg_match与preg_match_all 函数
2021-03-26 03:59

Kang He的博客正则表达式在 PHP 中的应用在 PHP 应用中，正则表达式主要用于：•正则匹配：根据正则表达式匹配相应的内容•正则替换：根据正则表达式匹配内容并替换•正则分割：根据正则表达式分割字符串在 PHP 中有两类正则...
preg_match函数在某些PHP脚本中无法正常工作 php
2010-11-12 20:36

回答 6 已采纳 This might help: http://www.phpwact.org/php/i18n/charsets
php preg_match找到<img>标签，但没有gif扩展名 php
2013-02-18 00:50

回答 3 已采纳 Don't do it that way. Attempting to parse HTML with regex is a task doomed to failure, since a sl
无法理解两个preg_match模式之间的区别 php
2012-01-25 06:10

回答 2 已采纳 The modifier is needed to process utf-8 encoded input properly. A pattern like \xC1 should match t
php ereg preg_match,正则表达式 preg_match()与ereg()函数
2021-04-12 12:14

鸭梨梨呐的博客作用：分割，匹配，查找，替换例如：验证邮箱地址格式，手机号码格式等等php中常用的正则函数：preg_match(mode, string subject, array matches); 更加规范执行效率越高ereg(mode, string ...
php preg_match 漏洞,PHP preg_match()函数信息泄露漏洞
2021-04-20 06:45

郭之然的博客发布日期：2009-09-27更新...PHP所使用的preg_match()函数从用户输入字符串获得参数，如果所传送的值为数组而不是字符串就会生成警告，警告消息中包含有当前运行脚本的完整路径。链接：http://marc.info/?l=bugtr...
基于preg_match_all采集后数据处理的一点心得笔记(编码转换和正则匹配)
2020-10-26 00:14

4. **正则表达式匹配与preg_match_all**： - `preg_match_all`函数用于在整个字符串中查找所有符合正则表达式的部分，并返回匹配结果。 - 参数包括：模式（pattern）、目标字符串（subject）、匹配结果数组...
PHP preg_match的匹配多国语言的技巧
2020-10-26 19:06

PHP中提供了强大的正则表达式处理函数preg_match，它允许我们在字符串中执行搜索以找到与特定模式匹配的文本。要实现对多国语言的支持，需要在使用preg_match时特别注意正则表达式的模式以及相关的参数。首先，来...
js 类似php preg_match 的方法,如何在JavaScript中使用类似于PHP的preg_match_all（）的正则表达式匹配...
2021-05-04 18:52

力噜噜噜力的博客斯蒂芬大帝我建议使用替代正则表达式，使用子组单独捕获参数的名称和值：function getUrlParams(url) { var re = /(?...([^]*))/g, match, params = {}, decode = function (s) {return decodeURIComponent(s.r...
PHP中preg_match_all正则匹配出需要的内容
2020-08-19 03:45

夏已微凉、的博客是特殊字符，所以要加上转义字符 \ 3、匹配量词中的一个【桶，盒，对，只，根，条】 [桶|盒|对|只|根|条] 4、匹配空格0个或多个 \s* 或者 \s{0,} 5、针对汉字匹配 /u /u 表示按unicode(utf-8)匹配（主要针对多字节...
php preg_match的匹配不同国家语言实例
2020-10-20 09:59

在PHP中，标准的`preg_match`函数默认使用的是UTF-8编码，这就意味着它能够正确处理包含多种语言的字符串。然而，当使用特定的正则表达式语法时，可能需要特别指定以支持Unicode。在PHP代码示例中，`preg_match("/...
preg_match匹配的正则乱码问题！
2022-09-19 06:00

Liusloan的博客 preg_match匹配的正则中可以是中文也可以是纯英文的字符，但如果混合会出现乱码。preg_match("/我的php/", "我的php技术"）preg_match("/我的/", "我的php技术"）
PHP 正则preg_match 与 preg_match_all 函数以及匹配中文
2017-08-16 03:40

youcijibi的博客 PHP 正则表达式匹配 preg_match 与 preg_match_all 函数正则表达式在 PHP 中的应用在 PHP 应用中，正则表达式主要用于：正则匹配：根据正则表达式匹配相应的内容正则替换：根据正则表达式匹配内容并替换正则...
没有解决我的问题, 去提问

preg_match与UTF8

2条回答 默认 最新

2条回答默认最新