douzou0073 2017-04-23 20:56
浏览 153
已采纳

preg_match_all提取字符串部分的最佳模式是什么?

Context ;

• from file_get_contents from url, i get lots of stuff like <item></item>, <url></url>, etc.

• i'm using preg_match_all to extract url, title, etc.

example:

$jStringToSubStract = '<a>stuffA</a><b>stuffB</b><url>http...</url>';
preg_match_all("#<url>(.*?)<\/url>#sx", $jStringToSubStract , $subItems, PREG_SET_ORDER);
foreach ( $subItems as $subItem  ) {        
        if ( strlen ($subItem[1]) > 0 ) {
            echo $subItem[1]; // this is returning the http... INSIDE <url></url> 
        }
}

but it's slow for a large amount...

Is there a faster alternative to preg_match_all to extract portion of strings ?

  • 写回答

2条回答 默认 最新

  • doupo2241 2017-05-25 06:53
    关注

    After seeing your posted solution, I now understand what you are trying to achieve. Since you are capturing only substrings in the format of [attrname]=[attrvalue] (which may be single quoted, double quoted, or not quoted at all), these are optimized patterns for you...

    This one will get ALL attributes: \K\S+=["']?[^>"']+["']?>?? Demo

    This one will get specific attributes: \K(?:alt|title|src|href)=["']?[^>"']+["']?>?? Demo

    These patterns do not use capture groups. This means your code will avoid unnecessary result array bloat and access the substrings as fullstring matches. Both of these patterns will run more efficiently than the patterns you have posted.

    I should also mention that both my patterns and your patterns are not 100% reliable because there is no check that these substrings are actually inside of html tags. This is the reason why html-parsing programs are strenuously encouraged. If you are certain that the text that you'll be reading won't have any free floating \S=\S formatted strings outside of the tags, then the results will be fine.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 phython如何实现以下功能?查找同一用户名的消费金额合并—
  • ¥15 孟德尔随机化怎样画共定位分析图
  • ¥18 模拟电路问题解答有偿速度
  • ¥15 CST仿真别人的模型结果仿真结果S参数完全不对
  • ¥15 误删注册表文件致win10无法开启
  • ¥15 请问在阿里云服务器中怎么利用数据库制作网站
  • ¥60 ESP32怎么烧录自启动程序
  • ¥50 html2canvas超出滚动条不显示
  • ¥15 java业务性能问题求解(sql,业务设计相关)
  • ¥15 52810 尾椎c三个a 写蓝牙地址