douci2516 2013-07-22 08:48
浏览 40
已采纳

PHP - exec awk或fread更快,用于读取非常大的文件上的列

I have a file containing plot data. Each line has 4 coordinates in total the data file can exceed 1 GB. Let's say, I would like to get the third column in a data file, which method should consider good practice and much faster?

Using execute:

exec("awk '{ print $3 }' data", $output);

Using PHP script:

$data = file("data");
$points = array();
foreach($data as $line)
    $points[] = $line[2];

Moreover, since the server does not allow to read large file, I have to use fread to read the file in several parts. But fread is not smart enough and some work must be done to combine the last line in each part. Any suggestion or any better method to read a column on a file in php?

  • 写回答

2条回答 默认 最新

  • duangaixing1509 2013-07-22 09:13
    关注

    Here /file is a 3.1 GB big file:

    root# time awk '{ print $3 }' /file >/dev/null
    
    real   1m42.430s
    user   1m0.241s
    sys    0m2.198s
    

    okay. ±1.7 minutes for awk. Let's test PHP (without field splitting, just third char):

    root# time php -r '$fp = fopen("/file", "r"); while (($buf = fgets($fp)) !== false) echo $buf[2]; fclose($fp);' >/dev/null
    
    real   4m17.322s
    user   3m16.571s
    sys    0m31.625s
    

    ±4.3 minutes for PHP! I don't want to imagine how long it would take if I'd use @Jack's code...

    PHP is far slower than awk. On really big files, use awk (invoked by exec()). As you see here, PHP spends a lot of time in user space (three times more as awk).

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 linux驱动,linux应用,多线程
  • ¥20 我要一个分身加定位两个功能的安卓app
  • ¥15 基于FOC驱动器,如何实现卡丁车下坡无阻力的遛坡的效果
  • ¥15 IAR程序莫名变量多重定义
  • ¥15 (标签-UDP|关键词-client)
  • ¥15 关于库卡officelite无法与虚拟机通讯的问题
  • ¥15 目标检测项目无法读取视频
  • ¥15 GEO datasets中基因芯片数据仅仅提供了normalized signal如何进行差异分析
  • ¥100 求采集电商背景音乐的方法
  • ¥15 数学建模竞赛求指导帮助