dtu11716 2013-09-23 04:36
浏览 166
已采纳

PHP:preg_replace吃掉所有内存

I am processing couple of GB of text, and my script dies on preg_replace(). After some research I extract the problematic part of the text, where the leak appears.

preg_replace('/\b\p{L}{0,2}\b/u', '', "\x65\xe2\xba\xb7\x69\xe3\xb1\xae"); 

PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 251105872 bytes)

I am trying to delete short (up to 2 chars) words. Also I found out, if I change regexp to:

preg_replace('/\b\p{L}{1,2}\b/u', '', "\x65\xe2\xba\xb7\x69\xe3\xb1\xae"); 

it works just OK.

Somebody can explain whats going on please? 1st example works on 99% texts.

  • 写回答

1条回答 默认 最新

  • dqwolwst50489 2013-09-23 05:25
    关注
    \b\p{L}{0,2}\b
            ^
    

    This 0 here will make the regex match in more places than you need and you get possibly twice or more to match and replace.

    E.g: You get 344 matches with a "Lorem ipsum" text with \b\p{L}{0,2}\b (regex101 demo) but only 19 with \b\p{L}{1,2}\b (regex101 demo).

    And if it's a replace, you get so many more to do!

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 关于#matlab#的问题:在模糊控制器中选出线路信息,在simulink中根据线路信息生成速度时间目标曲线(初速度为20m/s,15秒后减为0的速度时间图像)我想问线路信息是什么
  • ¥15 banner广告展示设置多少时间不怎么会消耗用户价值
  • ¥16 mybatis的代理对象无法通过@Autowired装填
  • ¥15 可见光定位matlab仿真
  • ¥15 arduino 四自由度机械臂
  • ¥15 wordpress 产品图片 GIF 没法显示
  • ¥15 求三国群英传pl国战时间的修改方法
  • ¥15 matlab代码代写,需写出详细代码,代价私
  • ¥15 ROS系统搭建请教(跨境电商用途)
  • ¥15 AIC3204的示例代码有吗,想用AIC3204测量血氧,找不到相关的代码。