doushai7225 2014-07-10 06:41
浏览 55
已采纳

有效地在PHP中创建大型csv文件

I have to create a big csv export file of more than 400 MB with PHP. First drafts of the export file and the PHP code allow some guesses about the performance.

To avoid extremely long processing times, I should focus on creating the export file efficiently and avoid PHP array-operations as they are too slow in that case. "Create the file efficiently" means: append big blocks of text to other big blocks in the file, each big block being created quickly.

Unfortunately, the "big blocks" are rather rectangles than lines. Building my export file will start with lots of line beginnings, like this:

Title a, Title b, Title c 

"2014",  "07",    "01" 

"2014",  "07",    "02" 

...

Then I would have to to add a "rectangle" of text to the right of the line beginnings:

Title a, Title b, Title c, extention 1, extention 2, extention 3 

"2014",  "07",    "01",    "23",        "1",         "null" 

"2014",  "07",    "02",    "23",        "1",         "null" 

...

If I have to do this line by line, it will slow me down again. So I'm hoping for a way to add "rectangles" in a file, just as you can in some text editors. Also helpful would be a concrete experience with huge text buffers in PHP, could also work.

As it is not my hosting, I'm not sure if I have permissions to invoke sed/akw.

So the question is: Can sb advice from experience how to handle big csv files in PHP efficiently (file block operations, file "rectangle" operations) or just how to handle big string buffers in PHP efficiently? There seems to be no framework for string buffers.

Thank you for your attention :-)

Note: This is not a duplicate of this: https://stackoverflow.com/questions/19725129/creating-big-csv-file-in-windows-apache2-php

  • 写回答

2条回答 默认 最新

  • dongnan1899 2014-07-12 16:20
    关注

    Encouraged by the answers/comments to my question, I've written a short benchmark test.

    Section a) creates 2 files each with 1 million lines, each line with 100 chars. It then merges them into a 3rd file, line by line like a zipper:

    line1_1 line2_1 
    line1_2 line2_2 
    line1_3 line2_3 
    

    That's what Raphael Müller suggested.

    Section b) fills 1 million rows (same size as in section 1) into a MySQL table with two columns. It fills the first column first, by 1 million insert statements. Then, with one update statement, it fills the second column. Like this, I would have used one command for handling several rows in one step ("rectangular" action as described in the question). In the table would then be the merged data file ready to readout and download.

    That's what Florin Asavoaie suggested.

    • In order to fill 1 file with 1 million lines each line 100 chars, it takes 4.2 seconds. In order to merge both files into a 3rd file, it takes 10 seconds.

    • In order to fill a MySQL table with 1 million lines each line 100 chars by single insert statements, it takes 440 seconds. So I haven't measured the second step.

    This is not a final conclusion in general about the performance of databases or file systems. Probably, the database could be optimized with some freedom at the hosting (which I don't have).

    I think for now it is somewhat safe to assume this performance order:

    1. RAM
    2. File System
    3. Database

    Which means, if your RAM is bursting at the seams because you create an export file, don't hesitate to write it in parts to files and merge them without much effort to maintain memory blocks.

    PHP is not the language to offer sophisticated low level memory block handling. But finally, you won't need it.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 ubuntu子系统密码忘记
  • ¥15 信号傅里叶变换在matlab上遇到的小问题请求帮助
  • ¥15 保护模式-系统加载-段寄存器
  • ¥15 matlab求解平差
  • ¥15 电脑桌面设定一个区域禁止鼠标操作
  • ¥15 求NPF226060磁芯的详细资料
  • ¥15 使用R语言marginaleffects包进行边际效应图绘制
  • ¥20 usb设备兼容性问题
  • ¥15 错误(10048): “调用exui内部功能”库命令的参数“参数4”不能接受空数据。怎么解决啊
  • ¥15 安装svn网络有问题怎么办