duanheye7909 2012-05-14 03:40
浏览 37
已采纳

解析巨大的XML - 记住上次成功处理的节点,以便在下次运行时设置偏移量

I have some pretty big xml files which used for scheduled import. I use cron to parse them. The problem is that processing takes too much time and always exceeds the php "max_execution_time". Since I use XMLReader, that allows to read xml "line by line", the only one solution I see: track current working "node", memorize it and set node offset on next cron run.

Now I have

  $xml = new XMLReader;
  $xml->open($file);
  $pointer = 0;

  while($xml->read()) {

    if ($xml->nodeType == XMLReader::ELEMENT && $xml->localName == 'Product') {
      $chunk = array();
      $chunk['ProductID'] = $xml->getAttribute('ProductID');
      $chunk['ProductName'] = $xml->getAttribute('ProductName');
      process_import($chunk); // Process received date
      save_current_node_in_BD($pointer++); // insert current position in BD
    }
  }
  $xml->close();
}

Is it good idea to use $pointer++ to count processed nodes? How to set an offset for next cron run?

  • 写回答

1条回答 默认 最新

  • douxuanwei1980 2012-05-14 03:56
    关注

    First of all, when you execute php from the cron, you normally use the cli version which has a default max_execution_time of 0 (disabled). If you can't change that, continue reading.

    If your XML can be parsed within time (parsing only, no processing) you can have two crons:

    1. The first cron will parse the XML and dump new tasks onto a pile.
    2. The second cron will take work from the pile, process it and then remove it from the pile.

    The pile can be implemented in a few ways, amongst which:

    • A database table
    • A directory of work items (each work item is one file)

    Edit

    If you can't disable the execution time limit you can keep a small file comprising the file name and position. At each iteration you can open this file to determine if there's still work to be done. To make sure it saves that file when the time ran out, you need to register a shutdown function.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥30 这是哪个作者做的宝宝起名网站
  • ¥60 版本过低apk如何修改可以兼容新的安卓系统
  • ¥25 由IPR导致的DRIVER_POWER_STATE_FAILURE蓝屏
  • ¥50 有数据,怎么建立模型求影响全要素生产率的因素
  • ¥50 有数据,怎么用matlab求全要素生产率
  • ¥15 TI的insta-spin例程
  • ¥15 完成下列问题完成下列问题
  • ¥15 C#算法问题, 不知道怎么处理这个数据的转换
  • ¥15 YoloV5 第三方库的版本对照问题
  • ¥15 请完成下列相关问题!