douci2516 2013-07-22 08:48
浏览 40
已采纳

PHP - exec awk或fread更快,用于读取非常大的文件上的列

I have a file containing plot data. Each line has 4 coordinates in total the data file can exceed 1 GB. Let's say, I would like to get the third column in a data file, which method should consider good practice and much faster?

Using execute:

exec("awk '{ print $3 }' data", $output);

Using PHP script:

$data = file("data");
$points = array();
foreach($data as $line)
    $points[] = $line[2];

Moreover, since the server does not allow to read large file, I have to use fread to read the file in several parts. But fread is not smart enough and some work must be done to combine the last line in each part. Any suggestion or any better method to read a column on a file in php?

  • 写回答

2条回答 默认 最新

  • duangaixing1509 2013-07-22 09:13
    关注

    Here /file is a 3.1 GB big file:

    root# time awk '{ print $3 }' /file >/dev/null
    
    real   1m42.430s
    user   1m0.241s
    sys    0m2.198s
    

    okay. ±1.7 minutes for awk. Let's test PHP (without field splitting, just third char):

    root# time php -r '$fp = fopen("/file", "r"); while (($buf = fgets($fp)) !== false) echo $buf[2]; fclose($fp);' >/dev/null
    
    real   4m17.322s
    user   3m16.571s
    sys    0m31.625s
    

    ±4.3 minutes for PHP! I don't want to imagine how long it would take if I'd use @Jack's code...

    PHP is far slower than awk. On really big files, use awk (invoked by exec()). As you see here, PHP spends a lot of time in user space (three times more as awk).

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?
  • ¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
  • ¥15 手机接入宽带网线,如何释放宽带全部速度
  • ¥30 关于#r语言#的问题:如何对R语言中mfgarch包中构建的garch-midas模型进行样本内长期波动率预测和样本外长期波动率预测
  • ¥15 ETLCloud 处理json多层级问题
  • ¥15 matlab中使用gurobi时报错
  • ¥15 这个主板怎么能扩出一两个sata口
  • ¥15 不是,这到底错哪儿了😭
  • ¥15 2020长安杯与连接网探
  • ¥15 关于#matlab#的问题:在模糊控制器中选出线路信息,在simulink中根据线路信息生成速度时间目标曲线(初速度为20m/s,15秒后减为0的速度时间图像)我想问线路信息是什么