dpict99695329 2011-01-04 22:26 采纳率: 100%
浏览 18
已采纳

匹配模式内的模式

I'm trying to match any bracketed items within <sup> tags.

My regular expression is being too greedy, starting with the first <sup> tag and ending at the last </sup> tag.

/<sup\b[^>]*>(.*?)\[(.*?)\](.*?)<\/sup>/

Example html:

<sup>[this should be gone]</sup>
<sup>but this should stay</sup>
<sup>this should [ also stay</sup>
[and this as well]
<sup><a href="#">[but this should definitely go]</a></sup>

Any idea why?

Thanks!

EDIT: I suppose these answers make sense. I've got much of the HTML parsed without regex; I just figured that this particular example would work with regex because it would do the following:

  1. see the first <sup> tag
  2. find the first instance of </sup>
  3. search the inside for (wild)(bracket)(wild)(closing bracket)(wild)
  • 写回答

5条回答 默认 最新

  • douyan8413 2011-01-04 22:32
    关注

    You really can't do this. It's impossible to parse HTMl with regular expressions, because regular expressions can only match regular languages; these languages are a simpler subset of the actual languages we use. One very common non-regular language is the Dyck language of balanced brackets; it's impossible to match correctly nested parentheses with regular expressions. And HTML, if you think about it, is the same as this, with tags replacing parentheses. Thus, matching (a) correctly nested sup tags is impossible, and (b) matching balanced braces is impossible. I don't use PHP myself, but I know it has access to an HTML DOM; I'd recommend using that instead. Then, filter through that for every sup tag, and check each one's inner text. If you only want to catch tags whose inner text is just [...], where the ... does not contain square brackets, you can use ^\[[^\]]+\]$ as your regex; if you want real nesting, more complicated checking is necessary.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(4条)

报告相同问题?

悬赏问题

  • ¥15 基于单片机数字电压表电路组成及框图
  • ¥100 任意维数的K均值聚类
  • ¥15 stamps做sbas-insar,时序沉降图怎么画
  • ¥15 unity第一人称射击小游戏,有demo,在原脚本的基础上进行修改以达到要求
  • ¥15 买了个传感器,根据商家发的代码和步骤使用但是代码报错了不会改,有没有人可以看看
  • ¥15 关于#Java#的问题,如何解决?
  • ¥15 加热介质是液体,换热器壳侧导热系数和总的导热系数怎么算
  • ¥100 嵌入式系统基于PIC16F882和热敏电阻的数字温度计
  • ¥15 cmd cl 0x000007b
  • ¥20 BAPI_PR_CHANGE how to add account assignment information for service line