douhuxi4145 2015-08-11 16:04
浏览 204

正则表达式查找HTML注释(<! - some string - >)

I use this regexp to find and replace an HTML comment traditionally:

//remove HTML comments
$HTML = preg_replace('/<!--(.|\s)+?-->/','',$HTML);

However, on one server that's apparently crashing (works fine on my VM but it's pretty high powered).

The logic is, start the comment, any character or whitespace (at least some = +), and the ? means "don't be greedy and stop at the first --> you get"

Is there a better way to write this, esp. the (.|\s)+? part?

  • 写回答

3条回答 默认 最新

  • dongying6659 2015-08-11 16:15
    关注

    Without a crash log, it's impossible to know exactly whether your expression is the culprit or not. Assuming it is though, it's likely the result of catastrophic backtracking due to greediness.

    And not that I advocate for using regular expressions to parse HTML (you'd be better to use DOMDocument), but if you continue down the regex path use:

    $HTML = preg_replace('/<!--([\s\S]+?)-->/','',$HTML);
    

    instead. It'll capture both whitespace and non whitespace, including new lines, and won't blow up due to backtracking.

    Example: https://regex101.com/r/qR1xW1/1

    评论

报告相同问题?

悬赏问题

  • ¥15 目详情-五一模拟赛详情页
  • ¥15 有了解d3和topogram.js库的吗?有偿请教
  • ¥100 任意维数的K均值聚类
  • ¥15 stamps做sbas-insar,时序沉降图怎么画
  • ¥15 买了个传感器,根据商家发的代码和步骤使用但是代码报错了不会改,有没有人可以看看
  • ¥15 关于#Java#的问题,如何解决?
  • ¥15 加热介质是液体,换热器壳侧导热系数和总的导热系数怎么算
  • ¥100 嵌入式系统基于PIC16F882和热敏电阻的数字温度计
  • ¥15 cmd cl 0x000007b
  • ¥20 BAPI_PR_CHANGE how to add account assignment information for service line