dtlygweb2017 2011-02-03 22:14
浏览 170
已采纳

正则表达式获取内容直到下一个div(不包含div)

I have the following input

<div style="s1">title1</div>
<div style="s1">content1</div>
<div style="s1">title2</div>
<div style="s1">content2</div>

I know title1 and title2 and I want to collect content1 and content2

I would need something like this:

<div style="s1">title1</div>.*?<div style="s1">(.*?)</div>

but since regexp is greedy, it matches until the end so it returns

content1</div>
    <div style="s1">title2</div>
    <div style="s1">content2

I would like to add to the pattern a list of tags that should not be included in the match.

Something like:

<div style="s1">title1</div>.*?<div style="s1">(.*?[^<div])</div>

where I refer with [^<div] to a not contain stuff. This should be multiple options, probably with the use of |

How can I do it?

  • 写回答

3条回答 默认 最新

  • doushan5222 2011-02-03 22:21
    关注

    Obligitory link.

    Now that that is out of the way, just do some dom manipulation and xpath:

        $dom = new DOMDocument();
        @$dom->loadHTML($html);
        $x = new DOMXPath($dom);        
    
        foreach($x->query("//div") as $node)
        {
           if (trim($node->textContent) == 'title1')
           {
               $content['title1'] = $node->nextSibling->textContent;
           }
        }
    

    Now wasn't that easy? So no more regexing html kay?

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥15 目详情-五一模拟赛详情页
  • ¥15 有了解d3和topogram.js库的吗?有偿请教
  • ¥100 任意维数的K均值聚类
  • ¥15 stamps做sbas-insar,时序沉降图怎么画
  • ¥15 买了个传感器,根据商家发的代码和步骤使用但是代码报错了不会改,有没有人可以看看
  • ¥15 关于#Java#的问题,如何解决?
  • ¥15 加热介质是液体,换热器壳侧导热系数和总的导热系数怎么算
  • ¥100 嵌入式系统基于PIC16F882和热敏电阻的数字温度计
  • ¥15 cmd cl 0x000007b
  • ¥20 BAPI_PR_CHANGE how to add account assignment information for service line