dsk61780 2014-01-05 07:15
浏览 112
已采纳

如何使用PHP preg_replace函数将Unicode代码点转换为实际字符/ HTML实体?

I want to convert a set of Unicode code points in string format to actual characters and/or HTML entities (either result is fine).

For example, if I have the following string assignment:

$str = '\u304a\u306f\u3088\u3046';

I want to use the preg_replace function to convert those Unicode code points to actual characters and/or HTML entities.

As per other Stack Overflow posts I saw for similar issues, I first attempted the following:

$str = '\u304a\u306f\u3088\u3046';
$str2 = preg_replace('/\u[0-9a-f]+/', '&#x$1;', $str);

However, whenever I attempt to do this, I get the following PHP error:

Warning: preg_replace() [function.preg-replace]: Compilation failed: PCRE does not support \L, \l, \N, \U, or \u

I tried all sorts of things like adding the u flag to the regex or changing /\u[0-9a-f]+/ to /\x{[0-9a-f]+}/, but nothing seems to work.

Also, I've looked at all sorts of other relevant pages/posts I could find on the web related to converting Unicode code points to actual characters in PHP, but either I'm missing something crucial, or something is wrong because I can't fix the issue I'm having.

Can someone please offer me a concrete solution on how to convert a string of Unicode code points to actual characters and/or a string of HTML entities?

  • 写回答

2条回答 默认 最新

  • doulan3966 2014-01-05 07:26
    关注

    From the PHP manual:

    Single and double quoted PHP strings have special meaning of backslash. Thus if \ has to be matched with a regular expression \\, then "\\\\" or '\\\\' must be used in PHP code.

    First of all, in your regular expression, you're only using one backslash (\). As explained in the PHP manual, you need to use \\\\ to match a literal backslash (with some exceptions).

    Second, you are missing the capturing groups in your original expression. preg_replace() searches the given string for matches to the supplied pattern and returns the string where the contents matched by the capturing groups are replaced with the replacement string.

    The updated regular expression with proper escaping and correct capturing groups would look like:

    $str2 = preg_replace('/\\\\u([0-9a-f]+)/i', '&#x$1;', $str);
    

    Output:

    おはよう
    

    Expression: \\\\u([0-9a-f]+)

    • \\\\ - matches a literal backslash
    • u - matches the literal u character
    • ( - beginning of the capturing group
      • [0-9a-f] - character class -- matches a digit (0 - 9) or an alphabet (from a - f) one or more times
    • ) - end of capturing group
    • i modifier - used for case-insensitive matching

    Replacement: &#x$1

    • & - literal ampersand character (&)
    • # - literal pound character (#)
    • x - literal character x
    • $1 - contents of the first capturing group -- in this case, the strings of the form 304a etc.

    RegExr Demo.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 微信公众平台自制会员卡可以通过收款码收款码收款进行自动积分吗
  • ¥15 随身WiFi网络灯亮但是没有网络,如何解决?
  • ¥15 gdf格式的脑电数据如何处理matlab
  • ¥20 重新写的代码替换了之后运行hbuliderx就这样了
  • ¥100 监控抖音用户作品更新可以微信公众号提醒
  • ¥15 UE5 如何可以不渲染HDRIBackdrop背景
  • ¥70 2048小游戏毕设项目
  • ¥20 mysql架构,按照姓名分表
  • ¥15 MATLAB实现区间[a,b]上的Gauss-Legendre积分
  • ¥15 delphi webbrowser组件网页下拉菜单自动选择问题