duanlvxi8652 2019-02-28 22:01
浏览 58
已采纳

改进此正则表达式以防止preg_replace抛出PREG_BACKTRACK_LIMIT_ERROR

I want to remove all the scipt-tags from a HTML-page, except those with the word foo or bar. So I came up with this statement:

$content = preg_replace('#<script((?!foo|bar).)*?</script>#is', '', $content);
echo "Last error: " + preg_last_error();

This works on smaller pages. But now I have a page with 30 big script-tags and it doesn't work. The error I get is: PREG_BACKTRACK_LIMIT_ERROR

So I think I need to improve my regex to prevent this error, because this statement works:

$content = preg_replace('#<script.*?</script>#is', '', $content); 

But this statement is removing all the script-tags, while I want to keep some of them.

There are solution about increasing the pcre.backtrack_limit, but I don't want to go that route. There should be a better solution imho.

The thing is that I don't know how to fix this, because the issue is with the regex as far as I can see.

Could you guide me to make the regex better so this error won't occur?

  • 写回答

1条回答 默认 最新

  • duanchen7703 2019-02-28 22:17
    关注

    I would strongly suggest not using regular expressions here, but making use of DOM parsing instead, which is often more appropriate in this kind of scenario:

    $doc = new \DOMDocument();
    $doc->loadHTML($html, LIBXML_HTML_NODEFDTD);
    
    $xpath = new \DOMXPath($doc);
    foreach ($xpath->query('//script[contains(text(), \'foo\') or contains(text(), \'bar\')]') as $script_tag) {
      $script_tag->parentNode->removeChild($script_tag);
    }
    
    echo $doc->saveHTML();
    

    If you have more words, you can build your xpath query from an array instead:

    $blacklist = ['foo', 'bar', 'apple', 'cold'];
    
    $query = '//script[' . join(' or ', array_map(function($banword) { 
      return "contains(text(), '$banword')"; 
    }, $blacklist)) . ']';
    
    foreach ($xpath->query($query) as $script_tag) {
      $script_tag->parentNode->removeChild($script_tag);
    }
    

    Demo: https://3v4l.org/dHGDt

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 matlab有关常微分方程的问题求解决
  • ¥15 perl MISA分析p3_in脚本出错
  • ¥15 k8s部署jupyterlab,jupyterlab保存不了文件
  • ¥15 ubuntu虚拟机打包apk错误
  • ¥199 rust编程架构设计的方案 有偿
  • ¥15 回答4f系统的像差计算
  • ¥15 java如何提取出pdf里的文字?
  • ¥100 求三轴之间相互配合画圆以及直线的算法
  • ¥100 c语言,请帮蒟蒻写一个题的范例作参考
  • ¥15 名为“Product”的列已属于此 DataTable