preg_match_all提取字符串部分的最佳模式是什么？

Context ;

• from file_get_contents from url, i get lots of stuff like <item></item>, <url></url>, etc.

• i'm using preg_match_all to extract url, title, etc.

example:

$jStringToSubStract = '<a>stuffA</a><b>stuffB</b><url>http...</url>';
preg_match_all("#<url>(.*?)<\/url>#sx", $jStringToSubStract , $subItems, PREG_SET_ORDER);
foreach ( $subItems as $subItem  ) {        
        if ( strlen ($subItem[1]) > 0 ) {
            echo $subItem[1]; // this is returning the http... INSIDE <url></url> 
        }
}

but it's slow for a large amount...

Is there a faster alternative to preg_match_all to extract portion of strings ?

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
doupo2241 2017-05-25 06:53
关注
After seeing your posted solution, I now understand what you are trying to achieve. Since you are capturing only substrings in the format of [attrname]=[attrvalue] (which may be single quoted, double quoted, or not quoted at all), these are optimized patterns for you...

This one will get ALL attributes: \K\S+=["']?[^>"']+["']?>?? Demo

This one will get specific attributes: \K(?:alt|title|src|href)=["']?[^>"']+["']?>?? Demo

These patterns do not use capture groups. This means your code will avoid unnecessary result array bloat and access the substrings as fullstring matches. Both of these patterns will run more efficiently than the patterns you have posted.

I should also mention that both my patterns and your patterns are not 100% reliable because there is no check that these substrings are actually inside of html tags. This is the reason why html-parsing programs are strenuously encouraged. If you are certain that the text that you'll be reading won't have any free floating \S=\S formatted strings outside of the tags, then the results will be fine.

本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(1条)

报告相同问题？

关注问题

PHP用preg_match_all正则多个关键字怎么写? php
2017-11-30 05:36

回答 8 已采纳 []改为() ``` $pattaern0='/(你好|中国|国家|新年|娱乐|程序|羁绊|www\\.baidu\\.com|google)+/u'; ```
使用正则表达式和php preg_match_all在括号之间获取字符串 php
2017-07-14 12:34

回答 2 已采纳 This method will extract your desired substrings and prepare the output data as you have requested
使用preg_match_all从字符串中提取Image SRC php
2012-09-16 21:38

回答 3 已采纳 Using regex to parse valid html is ill-advised. Because there can be unexpected attributes before
php小经验:解析preg_match与preg_match_all 函数
2020-12-19 00:23

在这个例子中，`preg_match_all()`不仅找到URL中的主机名，还进一步提取出二级域名"jb51.net"。此外，`preg_match_all()`还接受一个可选参数`flags`，用于控制结果数组的结构。例如，`PREG_PATTERN_ORDER`使得`...
PHP preg_match函数从字符串中获取电话号码 php
2014-09-01 13:16

回答 1 已采纳 Change your regular expression to /(\d+ \d+\-\d+)/ <?php $str = 'abc:123,phoneNumber:631 741-
PHP Preg_Match从字符串中提取逗号分隔值[重复] php
2013-01-14 14:58

回答 1 已采纳 Try this: $content = '[something = "1,2,3"]'; if (preg_match('/"([^"]+)"/', $content, $matches))
Preg split或preg match从具有已知和常量模式的字符串中提取 php
2015-07-17 04:41

回答 1 已采纳 I'd use preg_match: $string = 'European 222/555/111 obtained'; if (preg_match('~European ([^/]+)
PHP中preg_match函数正则匹配的字符串长度问题
2020-10-24 04:36

然而，在使用`preg_match`进行复杂模式匹配时，有时会遇到无法正确提取或匹配到内容的情况，这可能是因为遇到了与字符串长度相关的限制问题。首先，让我们了解一下`preg_match`函数的基本使用方法。`preg_match`...
PHP 正则提取内容中指定的字符串，应该怎么写？正则表达式
2021-11-08 08:33

回答 1 已采纳限定获取4个汉字后面接逗号的组合，公司名字一般大于4个字符 <meta charset="utf-8"> <?php $s=<<<str <p>内容第一
PHP，正则表达式提取字符串到浮点数[重复] php
2017-12-13 08:51

回答 1 已采纳 Your regex is just searching for \d+. You are wanting to include decimal points (.) in your result
PHP和正则表达式将字符串的两个独立部分提取为ONE重组变量 php
2015-04-25 14:27

回答 3 已采纳 Make the second regex group optional ?, i.e.: $string = '<ul> <li> &lt
php中使用preg_match_all匹配文章中的图片
2020-12-19 19:47

总结一下，`preg_match_all`是PHP中用于全局正则表达式匹配的函数，可以用来在文章内容中查找并提取特定模式的字符串。在处理图片链接时，它可以帮助我们轻松地从HTML文本中提取出所有的图片URL，从而实现对文章图片...
php preg_match_all结合str_replace替换内容中所有img
2020-10-30 06:02

`preg_match_all` 是PHP中的一个正则表达式匹配函数，它可以找到所有匹配指定模式的字符串，并返回一个数组。在这个例子中，它的任务是找出所有`<img>`标签，并提取出`src`属性的值。使用的正则表达式是： ```regex...
解决preg_match匹配过多字符长度的限制的思路分析
2020-12-13 13:27

中加入（随便放到哪里，我是直接放第一行的） pcre.backtrack_limit=-1 再次使用preg_match函数测试一下，大概1300多行上万个字符的字符串也能够匹配了。项目中，用preg_match正则提取目标内容，死活有问题，代码测...
php preg_match所有字符串,php_match/preg_match_all 默认有字符串长度限制
2021-05-02 02:25

829319的博客 php_match/preg_match_all 默认有字符串长度限制:52500(或许你的服务器环境是更长，或者更短),当字符串长度大于52500，只能匹配到52500数据，超出的部分会被系统自己截掉。项目中，用preg_match正则提取目标内容，...
PHP中preg_match正则匹配中的/u、/i、/s含义
2020-12-17 14:31

在处理PHP字符串长度问题时，`preg_match` 可能不如 `strlen()` 函数直观，但如果要考虑Unicode字符，`strlen()` 可能无法正确计算，此时可以结合 `/u` 修饰符的正则表达式来计算字符串的实际长度，比如： ```php $...
PHP 正则表达式之正则处理函数小结(preg_match,preg_match_all,preg_replace,preg_split)
2020-10-27 22:20

它搜索字符串中所有匹配特定模式的部分，并将其替换为指定的字符串。函数原型如下： ```php mixed preg_replace(mixed $pattern, mixed $replacement, mixed $subject[, int $limit = -1[, int &$count]]) ``` ...
基于preg_match_all采集后数据处理的一点心得笔记(编码转换和正则匹配)
2020-10-26 08:14

- `preg_match_all`函数用于在整个字符串中查找所有符合正则表达式的部分，并返回匹配结果。 - 参数包括：模式（pattern）、目标字符串（subject）、匹配结果数组（matches）以及可选的标志（flags）。 - 标志如`...
PHP正则匹配操作简单示例【preg_match_all应用】
2020-10-19 13:23

`preg_match_all`是PHP内置的一个函数，它用于在字符串中全局匹配正则表达式，并返回所有匹配的结果。这个函数可以帮助开发者从HTML、XML或其他文本数据中提取特定的信息。本示例中，`preg_match_all`被用来从HTML...
php中文字符串提取方法,preg_replace 和preg_match_all区别
2023-05-30 21:24

qikexun的博客如果函数 preg_replace() 搜索到匹配项，则会返回被替换后的 $subject，否则返回...如果 $subject 是一个数组，preg_replace() 函数会返回一个数组，其他情况下返回一个字符串。join() 函数是 implode() 函数的别名。
没有解决我的问题, 去提问

悬赏问题

¥15 ansys fluent计算闪退
¥15 有关wireshark抓包的问题
¥15 需要写计算过程，不要写代码，求解答，数据都在图上
¥15 向数据表用newid方式插入GUID问题
¥15 multisim电路设计
¥20 用keil，写代码解决两个问题，用库函数
¥50 ID中开关量采样信号通道、以及程序流程的设计
¥15 U-Mamba/nnunetv2固定随机数种子
¥15 vba使用jmail发送邮件正文里面怎么加图片
¥15 vb6.0如何向数据库中添加自动生成的字段数据。

preg_match_all提取字符串部分的最佳模式是什么？

2条回答 默认 最新

悬赏问题

2条回答默认最新