Is there anyway I can optimize reading this file?
Btw this is a Laravel application. data_fill()
is a Laravel specific function. Its just my php array building logic.
I took the 150,000 row, 16 column file and chopped it down to a much smaller sample of 4261 rows and it takes about 10-15 seconds. The complete file takes minutes (had to heavily modify nginx and php to allow for this).
Here is a blackfire.io report of the smaller file - https://blackfire.io/profiles/c4087f40-dd5c-42ed-9258-3c6d5a1ace51/graph
Looks like it is reading all 68176 cells (4261 rows * 16 columns) multiple times on the smaller file 349380 times (the darker red boxes).
Spreadsheets attached.
Here is the code i am using to process the file(s). I have hard coded some parameters for testing. Normally these would be request variables to allow for different column selection based on the header row.
public function report02()
{
ini_set('max_execution_time', '400');
// Time how long script takes to run
$executionStartTime = microtime(true);
$reader = ReaderFactory::create(Type::XLSX); // for XLSX files
$reader->open(storage_path('smaller.xlsx'));
$headers = [];
$header_tech = strtoupper('INSTALLER NBR');
$header_tech_index = 0;
$header_equipment_type = strtoupper('ITEM');
$header_equipment_type_index = 0;
$header_equipment_sn = strtoupper('SERIAL NUMBER');
$header_equipment_sn_index = 0;
$header_equipment_status = strtoupper('EQUIP STS');
$header_equipment_status_index = 0;
$header_equipment_age = strtoupper('DAYS ASSIGNED');
$header_equipment_age_index = 0;
$show_old_equipment_limit_in_days = 21;
// convert comma delimitied tech numbers to an array and trim white space
$techs = array_map('trim', explode(",", '9448,69091,69165,69327,69430,69445,69449,69711,70056'));
$data = [
'old_equipment' => [
'total' => 0
]
];
foreach ($reader->getSheetIterator() as $sheet) {
foreach ($sheet->getRowIterator() as $row => $values) {
// Check for last row - end of file
$values2 = array_map('trim', $values);
if (! empty($values2)) {
// Header row position
if ($row == 2) {
$headers = $values2;
$header_tech_index = array_search($header_tech, $headers);
$header_equipment_type_index = array_search($header_equipment_type, $headers);
$header_equipment_sn_index = array_search($header_equipment_sn, $headers);
$header_equipment_status_index = array_search($header_equipment_status, $headers);
$header_equipment_age_index = array_search($header_equipment_age, $headers);
}
// Data after header
if ($row > 2) {
// This row contains data for a tech we requested
if (in_array($values2[$header_tech_index], $techs)) {
// 7 Status
if ($values2[$header_equipment_status_index] == 7) {
if (array_has($data, 'techs.' . $values2[$header_tech_index] . '.counts.7 Status.' . $values2[$header_equipment_type_index])) {
$data['techs'][$values2[$header_tech_index]]['counts']['7 Status'][$values2[$header_equipment_type_index]]++;
} else {
data_fill($data, 'techs.' . $values2[$header_tech_index] . '.counts.7 Status.' . $values2[$header_equipment_type_index], 1);
}
// All tech equipment
data_fill($data, 'techs.' . $values2[$header_tech_index] . '.equipment.7 Status.' . $values2[$header_equipment_sn_index], [
'type' => $values2[$header_equipment_type_index],
'age' => $values2[$header_equipment_age_index],
]);
}
// T Status
if ($values2[$header_equipment_status_index] == 'T') {
if (array_has($data, 'techs.' . $values2[$header_tech_index] . '.counts.T Status.' . $values2[$header_equipment_type_index])) {
$data['techs'][$values2[$header_tech_index]]['counts']['T Status'][$values2[$header_equipment_type_index]]++;
} else {
data_fill($data, 'techs.' . $values2[$header_tech_index] . '.counts.T Status.' . $values2[$header_equipment_type_index], 1);
}
// All tech equipment
data_fill($data, 'techs.' . $values2[$header_tech_index] . '.equipment.T Status.' . $values2[$header_equipment_sn_index], [
'type' => $values2[$header_equipment_type_index],
'age' => $values2[$header_equipment_age_index],
]);
}
// X Days or older
if ($values2[$header_equipment_age_index] >= $show_old_equipment_limit_in_days && ($values2[$header_equipment_status_index] !== 'L')) {
data_fill($data, 'old_equipment.techs.' . $values2[$header_tech_index] . '.' . $values2[$header_equipment_sn_index], [
'type' => $values2[$header_equipment_type_index],
'status' => $values2[$header_equipment_status_index],
'age' => $values2[$header_equipment_age_index],
]);
$data['old_equipment']['total']++;
}
}
}
}
unset($values2);
}
}
$reader->close();
$executionEndTime = microtime(true);
$data['runtime_in_seconds'] = round($executionEndTime - $executionStartTime, 2);
return response()->json($data, 200);
}
Screenshot of smaller file output
Also created a github issue request but it's probably not much of a package issue as it is a implementation one - https://github.com/box/spout/issues/585.
Thanks for any help I receive.