donglu1971 2010-12-08 22:44
浏览 30
已采纳

用于选择性剥离HTML的正则表达式

I'm trying to parse some HTML with PHP as an exercise, outputting it as just text, and I've hit a snag. I'd like to remove any tags that are hidden with style="display: none;" - bearing in mind that the tag may contain other attributes and style properties.

The code I have so far is this:

$page = preg_replace("#<([a-z]+).*?style=\".*?display:\s*none[^>]*>.*?</\1>#s","",$page);`

The code it returning NULL with a PREG_BACKTRACK_LIMIT_ERROR.
I tried this instead:

$page = preg_replace("#<([a-z]+)[^>]*?style=\"[^\"]*?display:\s*none[^>]*>.*?</\1>#s","",$page);

But now it's just not replacing any tags.

Any help would be much appreciated. Thanks!

  • 写回答

3条回答 默认 最新

  • duanlei2458 2010-12-08 22:57
    关注

    Using DOMDocument, you can try something like this:

    $doc = new DOMDocument;
    $doc->loadHTMLFile("foo.html");
    $nodeList = $doc->getElementsByTagName('*');
    foreach($nodeList as $node) {
        if(strpos(strtolower($node->getAttribute('style')), 'display: none') !== false) {
            $doc->removeChild($node);
        }
    }
    $doc->saveHTMLFile("foo.html");
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥50 安卓adb backup备份子用户应用数据失败
  • ¥20 有人能用聚类分析帮我分析一下文本内容嘛
  • ¥15 请问Lammps做复合材料拉伸模拟,应力应变曲线问题
  • ¥30 python代码,帮调试
  • ¥15 #MATLAB仿真#车辆换道路径规划
  • ¥15 java 操作 elasticsearch 8.1 实现 索引的重建
  • ¥15 数据可视化Python
  • ¥15 要给毕业设计添加扫码登录的功能!!有偿
  • ¥15 kafka 分区副本增加会导致消息丢失或者不可用吗?
  • ¥15 微信公众号自制会员卡没有收款渠道啊