I'm currently writing an application in Laravel and am setting up some scheduled tasks that will be run periodically (artisan commands) -- maybe once or twice a month. These tasks grab some CSV files using CURL, parse them, and import them into my local MySQL database.
However, a couple of these of these CSV files are quite large -- around 500,000 lines each. So I have run into some memory issues.
Obviously I can increase the memory in my VM, but i'm wondering how what other practices I can implement to ensure that I don't run out of memory. If I try to parse the files in chunks, rather than all at once, will each chunk be treated as a separate system process? Or will it still be treated as one long execution for the server?
EDIT: My current approach. I'm using League CSV for the Parsing.
protected $offset = 1;
protected $parse_chunk = 10000;
public function parse()
{
$this->data = [];
$this->reader->setOffset($this->offset)->setLimit($this->parse_chunk)->fetchAll(function($row, $offset) {
array_push($this->data, array_combine($this->keys, $row));
$this->offset = $offset;
});
}
So currently, the above fetches the first 10000 rows, and does the processing I need, which stores them in a new $data
array. This data then gets passed to a Import Class, but my scripts are timing out in the parsing currently.
So the above works, but if I put that within a while
loop to access all 500,000 records, then it uses too much memory.
Should I be dispatching each chunk to a queue which gets processed in the background?
EDIT: Added Benchmarks
I spent some time testing how long different processes take. The Parsing had interesting results. Keep in mind, this is JUST parsing. So it is not being stored in the DB, but it is being stored in an Array in Memory.
Parsing 1000 records: .017 seconds
Parsing 10,000 records: .188 seconds
Parsing 100,000 records: 2.273 seconds
Parsing 500,000 records: Never completes.
I'm using the code above to execute this. What might cause the parsing to take so long (or perhaps fail) at 500,000 records?