dongpin3794 2018-11-08 17:14
浏览 64
已采纳

正则表达式删除完整的HTML实体

We have a requirement to remove special characters from text strings. For example, we may get a string that looks like this; the ® is the registered trademark symbol:

PEPSI&#174; Bottle 20 oz<br><br>

I'm not great with regex, and can't figure out how to edit the existing code to produce that.

Here's what we currently have:

$ui = "PEPSI Bottle 20 oz<br><br>";
$ui = preg_replace('/[^A-Za-z0-9\.\' -]/', '', $ui);

This results in PEPSI174 Bottle 20 ozbrbr.

Our desired result is PEPSI Bottle 20 oz<br><br>.

How can I edit the regex to make sure that

  1. It doesn't remove valid HTML tags like <br>, and
  2. If it does find a special character entity, it removes not only the special characters (the & and #), but also the numbers and semicolon?

We don't want to have it remove all the numbers, as obviously the string can contain numbers; it's only numbers that are part of the entity code that we need to remove.

  • 写回答

1条回答 默认 最新

  • dongman5539 2018-11-08 17:25
    关注

    You could use this but now I can't guaranty it covers all the possible HTML entities:

    $res = preg_replace('/&[A-Za-z0-9#]+;/', '', $ui);
    

    That says replace any substring that: - starts with & - followed by any number of alphanumeric characters or # in random order - followed by ;.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 求帮我调试一下freefem代码
  • ¥15 matlab代码解决,怎么运行
  • ¥15 R语言Rstudio突然无法启动
  • ¥15 关于#matlab#的问题:提取2个图像的变量作为另外一个图像像元的移动量,计算新的位置创建新的图像并提取第二个图像的变量到新的图像
  • ¥15 改算法,照着压缩包里边,参考其他代码封装的格式 写到main函数里
  • ¥15 用windows做服务的同志有吗
  • ¥60 求一个简单的网页(标签-安全|关键词-上传)
  • ¥35 lstm时间序列共享单车预测,loss值优化,参数优化算法
  • ¥15 Python中的request,如何使用ssr节点,通过代理requests网页。本人在泰国,需要用大陆ip才能玩网页游戏,合法合规。
  • ¥100 为什么这个恒流源电路不能恒流?