preg_match_all - 正则表达式的贪婪部分，但最大化匹配数

I have the following html to parse:

<h1 class="x">test</h1>
<p>some text <img src="x" /></p>

<h1 class="x1">test2</h1>
<p>some text </p>

<h1 class="2">test3</h1>
<p>some text <img src="x" /></p>

Can I parse this into an array with a single regular expression?

I tried

preg_match_all('#(<h1[^>]*?>)(.*?)(</h1>)(.*)#ism',$html,$arr);

which gives me only one entry, because the last part of the regex is greedy, and

preg_match_all('#(<h1[^>]*?>)(.*?)(</h1>)(.*?)#ism',$html,$arr);

which gives me nothing of the HTML between the <h1>, because the expression is not greedy.

How can I make the part after the be matched greedy, while at the same time matching as many occurences as possible?

Additional comments:

the question is fairly academical, I have resolved the problem using pre_split and a variety of other methods would work, but may also have downsides (for example DOM may not work on invalid HTML that I cannot control). However it is a recurring problem that I'd be interested to know more about.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dpquu9206 2011-03-02 21:59
关注
You need some form of end maker. The regex can not guess until which part you want to match.

Possible in this case might be a lookahead assertion after the (.*?) at the end:

(?=<h1|</body>|\z)#ims
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(1条)

报告相同问题？

关注问题

preg_match_all - 正则表达式的贪婪部分，但最大化匹配数 php
2011-03-02 21:46

回答 2 已采纳 You need some form of end maker. The regex can not guess until which part you want to match. Poss
php preg_match_all简单正则表达式返回空值 php
2015-11-05 10:42

回答 4 已采纳 You need to replace: preg_match_all('/\d*/', $string, $matches); with: preg_match_all('/\d+/',
PHP用preg_match_all正则多个关键字怎么写? php
2017-11-30 05:36

回答 8 已采纳 []改为() ``` $pattaern0='/(你好|中国|国家|新年|娱乐|程序|羁绊|www\\.baidu\\.com|google)+/u'; ```
php正则表达式. 123,preg_match中的正则表达式和模式 – PHP适合123-23-345
2021-03-26 13:48

之死的博客搜索热词我不是很擅长找到正确的正则表达式的自动化,生成不同的表达式,但是当这涉及PHP的脚本时,它变得很麻烦.我无法证明自己能够在preg_match中编写一个“适合”表达式的模式,如：123-23-345 … 123-34-456 …. 12-...
求php一条preg_match_all正则，取指定div的id开头？ php 正则表达式
2021-08-21 14:27

回答 1 已采纳 $reg = "/<div id=\"num_(.*?)_off\".*?>.*?<\/div>/ism";
使用正则表达式和php preg_match_all在括号之间获取字符串 php
2017-07-14 12:34

回答 2 已采纳 This method will extract your desired substrings and prepare the output data as you have requested
PHP preg_match_all：正则表达式帮助 php
2013-06-30 17:58

回答 1 已采纳 You need to capture them I believe. To do that, use brackets as follows: preg_match_all("/{([^}]*
c语言匹配字符串表达式函数,C语言中巧用正则表达式 regex_t
2021-05-19 11:23

weixin_39637975的博客标准的C和C++都不支持正则表达式，但有一些函数库可以辅助C/C++程序员完成这一功能，其中最著名的当数Philip Hazel的Perl-Compatible Regular Expression库，许多Linux发行版本都带有这个函数库。编译正则表达式为了...
preg_match - 正则表达式创建数组 php
2015-06-02 19:21

回答 3 已采纳 You're regex doesn't make sense as you have it. For one thing you are missing delimiters. The {, }
preg_match优化 - 正则表达式太慢了 php
2014-08-13 22:03

回答 1 已采纳 You can simplify this regular expression a bit. ~\[(code|php)][^[]*\[/\1](*SKIP)(*F)|:\(~i Li
PHP正则表达式与preg_match_all - 为什么不匹配？ php
2012-11-11 23:36

回答 1 已采纳 Add the modifier "s" to the regex: If this modifier is set, a dot metacharacter in the pattern
c语言正则表达式匹配字符串,C语言的正则表达式 regex
2021-05-17 02:47

weixin_39603537的博客 正则表达式在编程中的应用是非常广泛的，在C语言中，同样有着正则表达式的库，我们使用regex.h这个头说包含的函数来完成我们的需要：先看一段例子：#include #include #include #include char *sub_string(char *str...
PHP - preg_match正则表达式 php
2016-06-20 16:07

回答 2 已采纳 You need to allow any chars before the /lease with .*?, an end of string anchor $ and regex delim
PHP正则表达式详解
2019-09-18 14:06

litchi125的博客 PHP正则表达式0x00 简述0x01 POSIX 扩展0x02 PCRE 扩展0x03 分隔符0x04 元字符1.在方括号外使用的元字符2.在方括号内使用的元字符0x05 转义序列1.第一种用法2.第二用法3.第三种用法4.第四种用法0x06 Unicode 字符...
word html 正则表达式,正则表达式的高级技巧分享
2021-06-14 02:11

李茂宗的博客 正则表达式(regular expression abbr. regex) 功能强大，能够用于在一大串字符里找到所需信息。它利用约定俗成的字符结构表达式来发生作用。不幸的是，简单的正则表达式对于一些高级运用，功能远远不够。若要进行...
php 正则表达式测试_php正则表达式基本知识与应用详解【经典教程】
2021-03-22 20:14

通远的博客分享给大家供大家参考，具体如下：概述正则表达式是一种描述字符串结果的语法规则，是一个特定的格式化模式，可以匹配、替换、截取匹配的字符串。常用的语言基本上都有正则表达式，如JavaScript、Java等。其实，只有...
PHP-PCRE正则表达式
2017-12-18 16:15

xz_小郑的博客 正则表达式是一个从左往右匹配目标字符串的模式，PREG_函数
php双写绕过,PHP preg_系列漏洞小结
2021-03-31 08:25

froggengo的博客最近看 P 神以前写的文章，其中在 3 个参数的回调函数中提到了 preg_replace /e ...preg_matchint preg_match ( string $pattern , string $subject [, array &$matches [, int $flags = 0 [, int $offset = 0 ]...
php正则表达式sql注入,php防止sql注入漏洞代码 && 几种常见攻击的正则表达式
2021-04-27 03:48

策划98k的博客 } if (preg_match("/".$ArrFiltReq."/is",$StrFiltValue)==1&&!isset($_REQUEST['securityToken'])) { slog(" 操作IP: ".$_SERVER["REMOTE_ADDR"]." 操作时间: ".strftime("%Y-%m-%d %H:%M:%S")." 操作页面:".$_...
php正则电子邮件,关于正则表达式：PHP电子邮件验证
2021-04-08 12:37

weixin_39573781的博客对于php来说，使用preg(而不是ereg)进行电子邮件验证的最佳方法是什么，因为它被弃用/删除了。我不需要检查网站是否存在(这不像是最大的安全性)。我发现了很多方法，但它们(显然)不是很好的实践。我建议您使用FILTER...
没有解决我的问题, 去提问

悬赏问题

¥15 求MCSCANX 帮助
¥15 机器学习训练相关模型
¥15 Todesk 远程写代码 anaconda jupyter python3
¥15 我的R语言提示去除连锁不平衡时clump_data报错，图片以下所示，卡了好几天了，苦恼不知道如何解决，有人帮我看看怎么解决吗？
¥15 在获取boss直聘的聊天的时候只能获取到前40条聊天数据
¥20 关于URL获取的参数，无法执行二选一查询
¥15 液位控制，当液位超过高限时常开触点59闭合，直到液位低于低限时，断开
¥15 marlin编译错误，如何解决？
¥15 VUE项目怎么运行，系统打不开
¥50 pointpillars等目标检测算法怎么融合注意力机制

preg_match_all - 正则表达式的贪婪部分，但最大化匹配数

2条回答 默认 最新

悬赏问题

2条回答默认最新