dongyou5271 2013-09-16 21:43
浏览 23
已采纳

preg_match_all到下一个注释标签HTML包含。 评论

I try to get all text to to the next occurrence of the comment tag and the text between the brackets from the comment tag. At the moment i only get the comment tag text between the brackets but not the content to the next comment its only returns a empty string "" I'm kind of confused. Thanks!

header("Content-Type:text/plain");
$tmp= file_get_contents("filter.html");
preg_match_all('@<!--\[(.*?)\]-->(.*?)@su', $tmp, $found, PREG_SET_ORDER);
var_dump($found);

filter.html

<!--[%TEST%]-->
TEST
TEST
<!--[%DAS%]-->
DAS TEST
123456
<!--[%BKK%]-->
ABCDEFG
YXZ

The output i get is:

array(3) {
  [0]=>
  array(3) {
    [0]=>
    string(15) "<!--[%TEST%]-->"
    [1]=>
    string(6) "%TEST%"
    [2]=>
    string(0) ""
  }
  [1]=>
  array(3) {
    [0]=>
    string(14) "<!--[%DAS%]-->"
    [1]=>
    string(5) "%DAS%"
    [2]=>
    string(0) ""
  }
  [2]=>
  array(3) {
    [0]=>
    string(14) "<!--[%BKK%]-->"
    [1]=>
    string(5) "%BKK%"
    [2]=>
    string(0) ""
  }
}
  • 写回答

1条回答 默认 最新

  • douti0687 2013-09-16 21:45
    关注

    Solution: change the regex into...

    @<!--\[(.*?)\]-->(.*?)(?=<!--|$)@su
    

    Codepad Viper Demo.


    Explanation: the original regex almost correctly used .*? expression to get all the non-comments part. I said 'correctly', because the laziness modifier is indeed required here (otherwise the .* combo will happily consume the whole string). And I said 'almost', because the modifier is too lazy in this particular case - even an empty string is enough to satisfy it (as '' does match /.*/). That's why you get those empty strings in the $found - the victims of laziness taken to the extreme, they were...

    So what we need is to make this part of the regex a bit more 'eager' - persuade it to keep devouring the string until it...

    • either encounters the beginning of the new comment ('
    • or arrives at the end of the string.

    And that's exactly expressed by this lookahead pattern:

    (?=<!--|$)
    

    It reads as 'match ONLY at the position that's either followed by a new comment, or is actually the end of the string'. And that's how it whips this lazy .*? sub-expression into a helpful movement - no longer it's able to stop wherever it alone wants to.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥60 版本过低apk如何修改可以兼容新的安卓系统
  • ¥25 由IPR导致的DRIVER_POWER_STATE_FAILURE蓝屏
  • ¥50 有数据,怎么建立模型求影响全要素生产率的因素
  • ¥50 有数据,怎么用matlab求全要素生产率
  • ¥15 TI的insta-spin例程
  • ¥15 完成下列问题完成下列问题
  • ¥15 C#算法问题, 不知道怎么处理这个数据的转换
  • ¥15 YoloV5 第三方库的版本对照问题
  • ¥15 请完成下列相关问题!
  • ¥15 drone 推送镜像时候 purge: true 推送完毕后没有删除对应的镜像,手动拷贝到服务器执行结果正确在样才能让指令自动执行成功删除对应镜像,如何解决?