doupeng8419 2011-09-06 10:57
浏览 35
已采纳

处理非常大的csv文件,没有超时和内存错误

At the moment I'm writing an import script for a very big CSV file. The Problem is most times it stops after a while because of an timeout or it throws an memory error.

My Idea was now to parse the CSV file in "100 lines" steps and after 100 lines recall the script automatically. I tried to achieve this with header (location ...) and pass the current line with get but it didn't work out as I want to.

Is there a better way to this or does someone have an idea how to get rid of the memory error and the timeout?

  • 写回答

5条回答 默认 最新

  • douqiang5933 2011-09-06 11:19
    关注

    I've used fgetcsv to read a 120MB csv in a stream-wise-manner (is that correct english?). That reads in line by line and then I've inserted every line into a database. That way only one line is hold in memory on each iteration. The script still needed 20 min. to run. Maybe I try Python next time… Don't try to load a huge csv-file into an array, that really would consume a lot of memory.

    // WDI_GDF_Data.csv (120.4MB) are the World Bank collection of development indicators:
    // http://data.worldbank.org/data-catalog/world-development-indicators
    if(($handle = fopen('WDI_GDF_Data.csv', 'r')) !== false)
    {
        // get the first row, which contains the column-titles (if necessary)
        $header = fgetcsv($handle);
    
        // loop through the file line-by-line
        while(($data = fgetcsv($handle)) !== false)
        {
            // resort/rewrite data and insert into DB here
            // try to use conditions sparingly here, as those will cause slow-performance
    
            // I don't know if this is really necessary, but it couldn't harm;
            // see also: http://php.net/manual/en/features.gc.php
            unset($data);
        }
        fclose($handle);
    }
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(4条)

报告相同问题?

悬赏问题

  • ¥15 求帮我调试一下freefem代码
  • ¥15 R语言Rstudio突然无法启动
  • ¥15 关于#matlab#的问题:提取2个图像的变量作为另外一个图像像元的移动量,计算新的位置创建新的图像并提取第二个图像的变量到新的图像
  • ¥15 改算法,照着压缩包里边,参考其他代码封装的格式 写到main函数里
  • ¥15 用windows做服务的同志有吗
  • ¥60 求一个简单的网页(标签-安全|关键词-上传)
  • ¥35 lstm时间序列共享单车预测,loss值优化,参数优化算法
  • ¥15 Python中的request,如何使用ssr节点,通过代理requests网页。本人在泰国,需要用大陆ip才能玩网页游戏,合法合规。
  • ¥100 为什么这个恒流源电路不能恒流?
  • ¥15 有偿求跨组件数据流路径图