I wrote a one-off script that I use to parse PDFs saved on the database. So far it is working okay until I ran out of memory after parsing 2,700+ documents.
The basic flow of the script is as follows:
- Get a list of all the document IDs to be parsed and save it as an array in the session (~155k documents).
- Display a page that has a button to start parsing
- Make an AJAX request when that button is clicked that would parse the first 50 documents in the session array
$files = $_SESSION['files'];
$ids = array();
$slice = array_slice($files, 0, 50);
$files = array_slice($files, 50, null); // remove the 50 we are parsing on this request
if(session_status() == PHP_SESSION_NONE) {
session_start();
}
$_SESSION['files'] = $files;
session_write_close();
for($i = 0; $i < count($slice); $i++) {
$ids[] = ":id_{$i}";
}
$ids = implode(", ", $ids);
$sql = "SELECT d.id, d.filename, d.doc_content
FROM proj_docs d
WHERE d.id IN ({$ids})";
$stmt = oci_parse($objConn, $sql);
for($i = 0; $i < count($slice); $i++) {
oci_bind_by_name($stmt, ":id_{$i}", $slice[$i]);
}
oci_execute($stmt, OCI_DEFAULT);
$cnt = oci_fetch_all($stmt, $data);
oci_free_statement($stmt);
# Do the parsing..
# Output a table row..
- The response to the AJAX request typically includes a status whether the script has finished parsing the total ~155k documents - if it's not done, another AJAX request is made to parse the next 50. There's a 5 second delay between each request.
Questions
- Why am I running out of memory when I was expecting that peak memory usage would be when I get a list of all the document IDs on
#1
since it holds all of the possible documents not a few minutes later when the session array holds 2,700 elements less? - I saw a few questions similar to my problem and they suggested to either set the memory to
unlimited
which I don't want to do at all. The others suggested to set my variables tonull
when appropriate and I did that but I still ran out of memory after parsing ~2,700 documents. So what other approaches should I try?
# Freeing some memory space
$batch_size = null;
$with_xfa = null;
$non_xfa = null;
$total = null;
$files = null;
$ids = null;
$slice = null;
$sql = null;
$stmt = null;
$objConn = null;
$i = null;
$data = null;
$cnt = null;
$display_class = null;
$display = null;
$even = null;
$tr_class = null;