dsk710351 2018-08-21 11:06
浏览 146
已采纳

PHP:如何匹配一系列unicode配对的代理表情符号/表情符号?

anubhava's answer about matching ranges of unicode characters led me to the regex to use for cleaning up a specific range of single code point of characters. With it, now I can match all miscellaneous symbols in this list (includes emoticons) with this simple expression:

preg_replace('/[\x{2600}-\x{26FF}]/u', '', $str);

However, I also want to match those in this list of paired/double surrogates emoji, but as nhahtdh explained in a comment:

There is a range from d800 to dfff to specify surrogates in UTF-16 to allow for more characters to be specified. A single surrogate is not a valid character in UTF-16 (a pair is necessary to specify a valid character).

So, for example, when I try this:

preg_replace('/\x{D83D}\x{DE00}/u', '', $str);

For replacing only the first of the paired surrogates on this list, i.e.:

  • 写回答

1条回答 默认 最新

  • douzhuo2722 2018-08-21 14:43
    关注

    revo's comment above was very helpful to find a solution:

    If your PHP isn't shipped with a PCRE build for UTF-16 then you can't perform such a match. From PHP 7.0 on, you're able to use Unicode code points following this syntax \u{XXXX} e.g. preg_replace("~\u{1F600}~", '', $str); (Mind the double quotes)

    Since I am using PHP 7, echo "\u{1F602}"; outputs

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 什么设备可以研究OFDM的60GHz毫米波信道模型
  • ¥15 不知道是该怎么引用多个函数片段
  • ¥15 爬取1-112页所有帖子的标题但是12页后要登录后才能 我使用selenium模拟登录 账号密码输入后 会报错 不知道怎么弄了
  • ¥30 关于用python写支付宝扫码付异步通知收不到的问题
  • ¥50 vue组件中无法正确接收并处理axios请求
  • ¥15 隐藏系统界面pdf的打印、下载按钮
  • ¥15 基于pso参数优化的LightGBM分类模型
  • ¥15 安装Paddleocr时报错无法解决
  • ¥15 python中transformers可以正常下载,但是没有办法使用pipeline
  • ¥50 分布式追踪trace异常问题