I have a big gzipped file (~15Gb compressed, ~88 uncompressed) and I need to "explode" the content to a large amount of files. For example, if I read the following line :
foo property.content "I'm the content of the string."
I need to create a file named foo.db
and store inside :
property.content "I'm the content of the string."
I've succeed to get that. But i have performances issues. I think is maybe because of the large amount of file. (~31k files created in 60 seconds) but i don't sure. This is why I'm here.
My code is reading each piece of 1048576 bytes (of the gz file with gzread
) and sort the content in an Array to write all content one time by file. Then, i made a foreach loop to read the content of my cache, open the specific file and write into. For exemple if my cache look like this :
$cache = array(
"file_one" => "property.content \"I'm the content of the string.\"
property.foo \"I'm the content of another string.\"",
"file_two" => "property.foobar \"I'm the content of the another string.\"",
"file_three" => ...
);
The loop make this :
foreach ($cache as $file => $content) {
$filesrc = $file . ".db";
$fp = fopen($filesrc,"a");
fwrite($fp,$content."
");
fclose($fp);
}
With this method, i read ~65Mb and write ~31k files in 60 seconds. If i read all the content in one file, I wrote ~220Mb in 60 seconds.
There is something to do to improve the performances and create the small files ?
I'm using PHP 5.5.1
with Apache 2.4.6
on Windows
and I'm using CLI
to this script.
Edit : This is a log to get the time profile of each loop, for 131072 bytes of data readed : http://pastebin.com/uRPFfywY