dongzai3917 2013-07-23 15:44
浏览 24

PHP - 如何在Windows上写入大量具有性能的文件

I have a big gzipped file (~15Gb compressed, ~88 uncompressed) and I need to "explode" the content to a large amount of files. For example, if I read the following line :

foo    property.content    "I'm the content of the string."

I need to create a file named foo.db and store inside :

property.content    "I'm the content of the string."

I've succeed to get that. But i have performances issues. I think is maybe because of the large amount of file. (~31k files created in 60 seconds) but i don't sure. This is why I'm here.

My code is reading each piece of 1048576 bytes (of the gz file with gzread) and sort the content in an Array to write all content one time by file. Then, i made a foreach loop to read the content of my cache, open the specific file and write into. For exemple if my cache look like this :

$cache = array(
    "file_one" => "property.content    \"I'm the content of the string.\"
                   property.foo    \"I'm the content of another string.\"",
    "file_two" => "property.foobar    \"I'm the content of the another string.\"",
    "file_three" => ...
);

The loop make this :

foreach ($cache as $file => $content) {

    $filesrc = $file . ".db";
    $fp = fopen($filesrc,"a");
    fwrite($fp,$content."
");
    fclose($fp);

}

With this method, i read ~65Mb and write ~31k files in 60 seconds. If i read all the content in one file, I wrote ~220Mb in 60 seconds.

There is something to do to improve the performances and create the small files ? I'm using PHP 5.5.1 with Apache 2.4.6 on Windows and I'm using CLI to this script.

Edit : This is a log to get the time profile of each loop, for 131072 bytes of data readed : http://pastebin.com/uRPFfywY

  • 写回答

0条回答 默认 最新

    报告相同问题?

    悬赏问题

    • ¥15 我的数据无法存进链表里
    • ¥15 神经网络预测均方误差很小 但是图像上看着差别太大
    • ¥15 Oracle中如何从clob类型截取特定字符串后面的字符
    • ¥15 想通过pywinauto自动电机应用程序按钮,但是找不到应用程序按钮信息
    • ¥15 如何在炒股软件中,爬到我想看的日k线
    • ¥15 seatunnel 怎么配置Elasticsearch
    • ¥15 PSCAD安装问题 ERROR: Visual Studio 2013, 2015, 2017 or 2019 is not found in the system.
    • ¥15 (标签-MATLAB|关键词-多址)
    • ¥15 关于#MATLAB#的问题,如何解决?(相关搜索:信噪比,系统容量)
    • ¥500 52810做蓝牙接受端