dqd2800 2013-02-23 16:29
浏览 36
已采纳

灵活的正则表达式来取出DOM的一部分

First, I know about Simple HTML Dom parser and PHP's built-in solution, which none of them are doing exactly that kind of job I'm asking for (not to my knowledge).

I'm looking for PHP's PCRE that will find the element and the belonging content inside DOM, delete it and forgive if markup contains any extra whitespace.

Here is code:

<div id="maindiv">
    <div class="unusefuldiv1">Unuseful content</div>
    <div id="unusefuldiv2">Unuseful content2</div>
    <!--  ... some content I'm after for -->
</div>

I'm desperate about regular expression pattern that will delete both .uunusefuldiv1 and #unusefuldiv2 (markup together with content) and be (if possible) enough flexible to do the job if, for example <div class="unusefuldiv1"> is slightly mistyped with extra empty space: <div class="unusefuldiv1" > .

That might be something similar to

preg_replace('/<div\b[^>]*>(.*?)<\/div>/is', '', $dom_content);

except that this pattern will delete all div's, be them with of some classes, id's or without.

Does anyone have solution?

  • 写回答

2条回答 默认 最新

  • dongpu2476 2013-02-23 17:02
    关注
    $dom_content = preg_replace( 
        '/\s*<div [^<>]*unuseful[^<>]+>.*?<\/div\s*>\s*/is', '', $dom_content );
    

    will remove divs (and surrounding whitespace) whose opening tag contains the word unuseful.

    For a better regex solution you will need to better describe the criteria for deleting a div.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 Vue3 大型图片数据拖动排序
  • ¥15 划分vlan后不通了
  • ¥15 GDI处理通道视频时总是带有白色锯齿
  • ¥20 用雷电模拟器安装百达屋apk一直闪退
  • ¥15 算能科技20240506咨询(拒绝大模型回答)
  • ¥15 自适应 AR 模型 参数估计Matlab程序
  • ¥100 角动量包络面如何用MATLAB绘制
  • ¥15 merge函数占用内存过大
  • ¥15 使用EMD去噪处理RML2016数据集时候的原理
  • ¥15 神经网络预测均方误差很小 但是图像上看着差别太大