douzou0073 2017-04-23 20:56
浏览 153
已采纳

preg_match_all提取字符串部分的最佳模式是什么?

Context ;

• from file_get_contents from url, i get lots of stuff like <item></item>, <url></url>, etc.

• i'm using preg_match_all to extract url, title, etc.

example:

$jStringToSubStract = '<a>stuffA</a><b>stuffB</b><url>http...</url>';
preg_match_all("#<url>(.*?)<\/url>#sx", $jStringToSubStract , $subItems, PREG_SET_ORDER);
foreach ( $subItems as $subItem  ) {        
        if ( strlen ($subItem[1]) > 0 ) {
            echo $subItem[1]; // this is returning the http... INSIDE <url></url> 
        }
}

but it's slow for a large amount...

Is there a faster alternative to preg_match_all to extract portion of strings ?

  • 写回答

2条回答 默认 最新

  • doupo2241 2017-05-25 06:53
    关注

    After seeing your posted solution, I now understand what you are trying to achieve. Since you are capturing only substrings in the format of [attrname]=[attrvalue] (which may be single quoted, double quoted, or not quoted at all), these are optimized patterns for you...

    This one will get ALL attributes: \K\S+=["']?[^>"']+["']?>?? Demo

    This one will get specific attributes: \K(?:alt|title|src|href)=["']?[^>"']+["']?>?? Demo

    These patterns do not use capture groups. This means your code will avoid unnecessary result array bloat and access the substrings as fullstring matches. Both of these patterns will run more efficiently than the patterns you have posted.

    I should also mention that both my patterns and your patterns are not 100% reliable because there is no check that these substrings are actually inside of html tags. This is the reason why html-parsing programs are strenuously encouraged. If you are certain that the text that you'll be reading won't have any free floating \S=\S formatted strings outside of the tags, then the results will be fine.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 ansys fluent计算闪退
  • ¥15 有关wireshark抓包的问题
  • ¥15 需要写计算过程,不要写代码,求解答,数据都在图上
  • ¥15 向数据表用newid方式插入GUID问题
  • ¥15 multisim电路设计
  • ¥20 用keil,写代码解决两个问题,用库函数
  • ¥50 ID中开关量采样信号通道、以及程序流程的设计
  • ¥15 U-Mamba/nnunetv2固定随机数种子
  • ¥15 vba使用jmail发送邮件正文里面怎么加图片
  • ¥15 vb6.0如何向数据库中添加自动生成的字段数据。