dongyou5271 2013-09-16 21:43
浏览 23
已采纳

preg_match_all到下一个注释标签HTML包含。 评论

I try to get all text to to the next occurrence of the comment tag and the text between the brackets from the comment tag. At the moment i only get the comment tag text between the brackets but not the content to the next comment its only returns a empty string "" I'm kind of confused. Thanks!

header("Content-Type:text/plain");
$tmp= file_get_contents("filter.html");
preg_match_all('@<!--\[(.*?)\]-->(.*?)@su', $tmp, $found, PREG_SET_ORDER);
var_dump($found);

filter.html

<!--[%TEST%]-->
TEST
TEST
<!--[%DAS%]-->
DAS TEST
123456
<!--[%BKK%]-->
ABCDEFG
YXZ

The output i get is:

array(3) {
  [0]=>
  array(3) {
    [0]=>
    string(15) "<!--[%TEST%]-->"
    [1]=>
    string(6) "%TEST%"
    [2]=>
    string(0) ""
  }
  [1]=>
  array(3) {
    [0]=>
    string(14) "<!--[%DAS%]-->"
    [1]=>
    string(5) "%DAS%"
    [2]=>
    string(0) ""
  }
  [2]=>
  array(3) {
    [0]=>
    string(14) "<!--[%BKK%]-->"
    [1]=>
    string(5) "%BKK%"
    [2]=>
    string(0) ""
  }
}
  • 写回答

1条回答 默认 最新

  • douti0687 2013-09-16 21:45
    关注

    Solution: change the regex into...

    @<!--\[(.*?)\]-->(.*?)(?=<!--|$)@su
    

    Codepad Viper Demo.


    Explanation: the original regex almost correctly used .*? expression to get all the non-comments part. I said 'correctly', because the laziness modifier is indeed required here (otherwise the .* combo will happily consume the whole string). And I said 'almost', because the modifier is too lazy in this particular case - even an empty string is enough to satisfy it (as '' does match /.*/). That's why you get those empty strings in the $found - the victims of laziness taken to the extreme, they were...

    So what we need is to make this part of the regex a bit more 'eager' - persuade it to keep devouring the string until it...

    • either encounters the beginning of the new comment ('
    • or arrives at the end of the string.

    And that's exactly expressed by this lookahead pattern:

    (?=<!--|$)
    

    It reads as 'match ONLY at the position that's either followed by a new comment, or is actually the end of the string'. And that's how it whips this lazy .*? sub-expression into a helpful movement - no longer it's able to stop wherever it alone wants to.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥50 寻找一位有逆向游戏盾sdk 应用程序经验的技术
  • ¥15 请问有用MZmine处理 “Waters SYNAPT G2-Si QTOF质谱仪在MSE模式下采集的非靶向数据” 的分析教程吗
  • ¥50 opencv4nodejs 如何安装
  • ¥15 adb push异常 adb: error: 1409-byte write failed: Invalid argument
  • ¥15 nginx反向代理获取ip,java获取真实ip
  • ¥15 eda:门禁系统设计
  • ¥50 如何使用js去调用vscode-js-debugger的方法去调试网页
  • ¥15 376.1电表主站通信协议下发指令全被否认问题
  • ¥15 物体双站RCS和其组成阵列后的双站RCS关系验证
  • ¥15 复杂网络,变滞后传递熵,FDA