dongmu3457 2009-08-10 14:19 采纳率: 0%
浏览 38
已采纳

使用PHP解析大型文本文件而不杀死服务器

I'm trying to read some large text files (between 50M-200M), doing simple text replacement (Essentially the xml I have hasn't been properly escaped in a few, regular cases). Here's a simplified version of the function:

<?php
function cleanFile($file1, $file2) {
$input_file     = fopen($file1, "r");
$output_file    = fopen($file2, "w");
  while (!feof($input_file)) {
    $buffer = trim(fgets($input_file, 4096));
    if (substr($buffer,0, 6) == '<text>' AND substr($buffer,0, 15) != '<text><![CDATA[')
    {
      $buffer = str_replace('<text>', '<text><![CDATA[', $buffer);
      $buffer = str_replace('</text>', ']]></text>', $buffer);
    }
   fputs($output_file, $buffer . "
");
  }
  fclose($input_file);
  fclose($output_file);     
}
?>

What I don't get is that for the largest of files, around 150mb, PHP memory usage goes off the chart (around 2GB) before failing. I thought that this was the most memory efficient way to go about reading large files. Is there some method I am missing that would be more efficient for memory? Perhaps some setting that's keeping things in memory when it should be being collected?

In other words, it's not working and I don't know why, and as far as I know I am not doing things incorrectly. Any direction for me to go? Thanks for any input.

  • 写回答

3条回答 默认 最新

  • dougui1977 2009-08-10 14:21
    关注

    PHP isn't really designed for this. Offload the work to a different process and call it or start it from PHP. I suggest using Python or Perl.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥15 寻一个支付宝扫码远程授权登录的软件助手app
  • ¥15 解riccati方程组
  • ¥15 display:none;样式在嵌套结构中的已设置了display样式的元素上不起作用?
  • ¥15 使用rabbitMQ 消息队列作为url源进行多线程爬取时,总有几个url没有处理的问题。
  • ¥15 Ubuntu在安装序列比对软件STAR时出现报错如何解决
  • ¥50 树莓派安卓APK系统签名
  • ¥65 汇编语言除法溢出问题
  • ¥15 Visual Studio问题
  • ¥20 求一个html代码,有偿
  • ¥100 关于使用MATLAB中copularnd函数的问题