dongxin5054 2012-05-06 16:16
浏览 60
已采纳

正则表达式匹配HTML标记内的文本

I'm trying to write a regex that will remove HTML tags around a placeholder text, so that this:

<p>
    Blah</p>
<p>
    {{{body}}}</p>
<p>
    Blah</p>

Becomes this:

<p>
    Blah</p>
{{{body}}}
<p>
    Blah</p>

My current regex is /<.+>.*\{\{\{body\}\}\}<\/.+>/msU. However, it will also remove the contents of the tag preceding the placeholder, resulting in:

{{{body}}}
<p>
    Blah</p>

I can't assume the users will always place the placeholder inside <p>, so I would like it to remove any pair of tags immediately around the placeholder. I would appreciate some help with correcting my regex.

[EDIT]

I think it's important to note that the input may or may not be processed by CKEditor. It adds newlines and tabs to the opening tags, thus the regex needs to go with the /sm (dotall + multiline) modifiers.

  • 写回答

2条回答 默认 最新

  • douliedai4838 2012-05-06 16:20
    关注

    Try this:

    <[^>]+>\s*\{{3}body\}{3}\s*<\/[^>]+>
    

    See it here in action: http://regexr.com?30s4o

    Here's the breakdown:

    • <[^>]+> matches an opening HTML tag, and only that.
    • \s* captures any whitespace (equivalent to [ \t ]*)
    • \{{3} matches a { exactly 3 times
    • body matches the string literally
    • \}{3} matches a } exactly 3 times
    • \s* again, captures any whitespace
    • <\/[^>]+> matches a closing HTML tag
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 socket通信实现多人聊天室疑惑
  • ¥15 DEV-C++编译缺失
  • ¥33 找熟练码农写段Pyhthon程序
  • ¥100 怎么让数据库字段自动更新
  • ¥15 antv g6 力导向图布局
  • ¥15 quartz框架,No record found for selection of Trigger with key
  • ¥15 锅炉建模+优化算法,遗传算法优化锅炉燃烧模型,ls-svm会搞,后面的智能算法不会
  • ¥20 MATLAB多目标优化问题求解
  • ¥15 windows2003服务器按你VPN教程设置后,本地win10如何连接?
  • ¥15 求一阶微分方程的幂级数