dsfs21312 2018-10-15 18:09
浏览 56
已采纳

如何管理PHP内存?

I wrote a one-off script that I use to parse PDFs saved on the database. So far it is working okay until I ran out of memory after parsing 2,700+ documents.

The basic flow of the script is as follows:

  1. Get a list of all the document IDs to be parsed and save it as an array in the session (~155k documents).
  2. Display a page that has a button to start parsing
  3. Make an AJAX request when that button is clicked that would parse the first 50 documents in the session array

$files = $_SESSION['files'];

$ids = array();

$slice = array_slice($files, 0, 50);
$files = array_slice($files, 50, null); // remove the 50 we are parsing on this request

if(session_status() == PHP_SESSION_NONE) {
  session_start();
}
$_SESSION['files'] = $files;
session_write_close();

for($i = 0; $i < count($slice); $i++) {
  $ids[] = ":id_{$i}";
}
$ids = implode(", ", $ids);

$sql = "SELECT d.id, d.filename, d.doc_content
  FROM proj_docs d
  WHERE d.id IN ({$ids})";

$stmt = oci_parse($objConn, $sql);
for($i = 0; $i < count($slice); $i++) {
  oci_bind_by_name($stmt, ":id_{$i}", $slice[$i]);
}
oci_execute($stmt, OCI_DEFAULT);
$cnt = oci_fetch_all($stmt, $data);
oci_free_statement($stmt);

# Do the parsing..
# Output a table row..

  1. The response to the AJAX request typically includes a status whether the script has finished parsing the total ~155k documents - if it's not done, another AJAX request is made to parse the next 50. There's a 5 second delay between each request.

Questions

  • Why am I running out of memory when I was expecting that peak memory usage would be when I get a list of all the document IDs on #1 since it holds all of the possible documents not a few minutes later when the session array holds 2,700 elements less?
  • I saw a few questions similar to my problem and they suggested to either set the memory to unlimited which I don't want to do at all. The others suggested to set my variables to null when appropriate and I did that but I still ran out of memory after parsing ~2,700 documents. So what other approaches should I try?

# Freeing some memory space
$batch_size = null;
$with_xfa = null;
$non_xfa = null;
$total = null;
$files = null;
$ids = null;
$slice = null;
$sql = null;
$stmt = null;
$objConn = null;
$i = null;
$data = null;
$cnt = null;
$display_class = null;
$display = null;
$even = null;
$tr_class = null;
  • 写回答

1条回答 默认 最新

  • doufan3958 2018-10-15 20:13
    关注

    So I'm not really sure why but reducing the number of documents I'm parsing from 50 down to 10 for each batch seems to fix the issue. I've gone past 5,000 documents now and the script is still running. My only guess is that when I was parsing 50 documents I must have encountered a lot of large files which used up all of the memory allotted.

    Update #1

    I got another error about memory running out at 8,500+ documents. I've reduced the batches further down to 5 documents each and will see tomorrow if it goes all the way to parsing everything. If that fails, I'll just increase the memory allocated temporarily.

    Update #2

    So it turns out that the only reason why I'm running out of memory is that we apparently have multiple PDF files that are over 300MB uploaded on to the database. I increased the memory allotted to PHP to 512MB and this seems to have allowed me to finish parsing everything.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 使用ue5插件narrative时如何切换关卡也保存叙事任务记录
  • ¥20 软件测试决策法疑问求解答
  • ¥15 win11 23H2删除推荐的项目,支持注册表等
  • ¥15 matlab 用yalmip搭建模型,cplex求解,线性化处理的方法
  • ¥15 qt6.6.3 基于百度云的语音识别 不会改
  • ¥15 关于#目标检测#的问题:大概就是类似后台自动检测某下架商品的库存,在他监测到该商品上架并且可以购买的瞬间点击立即购买下单
  • ¥15 神经网络怎么把隐含层变量融合到损失函数中?
  • ¥15 lingo18勾选global solver求解使用的算法
  • ¥15 全部备份安卓app数据包括密码,可以复制到另一手机上运行
  • ¥20 测距传感器数据手册i2c