dongtan8122 2015-08-04 23:54
浏览 24
已采纳

Icecat和PHP files.index.xml

I have several scripts running that downloads the daily xml and looks for every .xml in it and downloads them to a different folder so

                    1234.xml
                  / 
daily.index.xml - - 4567.xml
                  \
                    6789.xml

Now I wish to do the same with the files.index.xml file, But everytime I try to open the index file the server stops with:

PHP Fatal error: Allowed memory size of 1073741824 bytes exhausted (tried to allocate 1073217536 bytes)

Is there a way to open up and dissect files.index.xml without my server to crash constantly?

Update: I believe the server hangs somewhere while running the script, as some XML files are beeing stored in the directory

Script:

// URL for index file
$url = "http://data.icecat.biz/export/level4/EN/files.index.xml";


// Custom header (username/pass is a paid account, so I can't share the credentials)
$context = stream_context_create(array (
    'http' => array (
        'header' => 'Authorization: Basic ' . base64_encode("username:pass")
    )
));

// Get XML File
$indexfile = file_get_contents($url, false, $context);


// Save XML
$file = '../myhomeservices/fullindex/files_index.xml';
unlink($file); 
$dailyfile = fopen("../myhomeservices/fullindex/files_index.xml", "w") or die("Unable to open file!");
chmod($dailyfile, 0777); 
// Write the contents back to the file
$dailyxmlfile = fwrite($dailyfile, $indexfile);
if($dailyxmlfile){
} else {
echo 'Error!';  
}
fclose($myfile);enter code here

Apache logs that 'file_get_contents($url, false, $context);' is leading to max out the memory.

Currently I'm trying to upload the files.index.xml (1,41gb file) in hope that I can process it this way.

  • 写回答

1条回答 默认 最新

  • donglian8407 2015-08-05 00:26
    关注

    Based on the information provided, there are two issues here. The most direct issue is that you're trying to allocate an extra 1GB of memory to your PHP script after it's already reached its 1GB limit (which is much higher than the default limit). Assuming you're using PHP 5.1+, you can use fopen() and file_put_contents() together to buffer the file between HTTP and disk:

    <?php
    $url = "http://data.icecat.biz/export/level4/EN/files.index.xml";
    
    // Custom header (username/pass is a paid account, so I can't share the credentials)
    $context = stream_context_create(array (
        'http' => array (
            'header' => 'Authorization: Basic ' . base64_encode("username:pass")
        )
    ));
    
    $file = '../myhomeservices/fullindex/files_index.xml';
    @unlink($file); 
    chmod($file, 0777); 
    
    // Write the contents back to the file
    if (!file_put_contents($file, fopen($url, 'r', false, $context)))
    {
        echo 'Error!';  
    }
    

    If you need more control over the buffering, you can fread() a fixed-size buffer from HTTP and fwrite() the buffer to the output file as you read it. You can also use the PHP cURL Extension to download the file, if you'd rather cURL handle the buffering.

    As posted, your code reads the entire remote file into memory, then makes a copy of the whole thing as it writes it into the output file.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?
  • ¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
  • ¥15 手机接入宽带网线,如何释放宽带全部速度
  • ¥30 关于#r语言#的问题:如何对R语言中mfgarch包中构建的garch-midas模型进行样本内长期波动率预测和样本外长期波动率预测
  • ¥15 ETLCloud 处理json多层级问题
  • ¥15 matlab中使用gurobi时报错
  • ¥15 这个主板怎么能扩出一两个sata口
  • ¥15 不是,这到底错哪儿了😭
  • ¥15 2020长安杯与连接网探
  • ¥15 关于#matlab#的问题:在模糊控制器中选出线路信息,在simulink中根据线路信息生成速度时间目标曲线(初速度为20m/s,15秒后减为0的速度时间图像)我想问线路信息是什么