dsc862009 2015-01-12 20:37
浏览 31
已采纳

使用PHP读取两个文件,数学计算和写出结果的最有效方法[关闭]

I have written a PHP script which opens two very large files (>1gb), both of which contain 4 columns. The script does a calculation on the corresponding values of each row in each file, and writes out the result to a third file.

My method is incredibly slow. I am using the SplFileObject to read the origin files and move the internal pointer line by line, as described in the code below.

It then writes out the result, line by line. However both the calculation and the write-out are very slow (the script is slow even if I disable the writes). I presume my method of file reading/writing are very inefficient and I'd appreciate tips for optimization.

function generate_adjusted($WGFile, $RFile) {

        // File Read objects
        $WGObj = new SplFileObject($WGFile);
        $RObj = new SplFileObject($RFile);

        // File write object
        $adjHandle = fopen("outputfile.txt", 'w+');

        foreach ($WGObj as $line) {
            // Line 0: ID1 (int), 1: ID2 (int), 2: NSNPs (int), 3: Relationship (real)
            $WGline = explode("\t", $WGObj->current());

            // Seek to the same line of second file
            $RObj->seek($WGObj->key());

            $Rline = explode("\t", $RObj->current());
            $A1 = floatval($WGline[2] * $WGline[3]);
            $A2 = floatval($Rline[2] * $Rline[3]);
            $ANSNP = $WGline[2] - $Rline[2];
            $A3 = round(floatval(($A1 - $A2) / $ANSNP), 3);

            // Construct the adjusted line
            $adjLine = $WGline[0] . "\t" . $WGline[1] . "\t" . $ANSNP . "\t" . $A3 . "
";

            fwrite($adjHandle, $adjLine);
        }
        fclose($adjHandle);     
}

generate_adjusted('inputfile1.txt', 'inputfile2.txt');
  • 写回答

1条回答 默认 最新

  • doukuiqian5345 2015-01-12 21:15
    关注

    First and best advice: benchmark it!

    Do not take any advice you get as a definitive fact (not even mine). Performance will vary based on your operating system, hardware and PHP version.

    The following should be a fast approach and directly includes a micro-benchmark for you. Please test it and let us know.

    <?php
    
    $start = microtime(true);
    
    function get_file_handle($file, $mode) {
        $h = fopen(__DIR__ . DIRECTORY_SEPARATOR . $file, "{$mode}b");
        if (!$h) {
            trigger_error("Could not read {$file}.", E_USER_ERROR);
        }
    
        // Make sure nobody else is reading or writing to our file.
        if (flock($h, LOCK_SH | LOCK_EX) === false) {
            trigger_error("Could not acquire lock for {$file}", E_USER_ERROR);
        }
    
        return $h;
    }
    
    // We only want to read and not write.
    $input_handle1 = get_file_handle("input1", "r");
    $input_handle2 = get_file_handle("input2", "r");
    
    // We only want to write and not read.
    $output_handle = get_file_handle("output", "w");
    
    // Read from both files at the same time the next line.
    // NOTE: This only works if lines are always corresponding in both files.
    while (($buffer1 = fgets($input_handle1)) !== false && ($buffer2 = fgets($input_handle2)) !== false) {
        $buffer1 = explode("\t", $buffer1);
        $buffer2 = explode("\t", $buffer2);
    
        // Forget floatval, let PHP do its dynamic casting.
        // NOTE: If precision is important use e.g. bcmath!
        $a1 = $buffer1[2] * $buffer1[3];
        $a2 = $buffer2[2] * $buffer2[3];
        $ansnp = $buffer1[2] - $buffer2[2];
        $a3 = round(($a1 - $a2) / $ansnp, 3);
    
        if (fwrite($output_handle, "{$buffer1[0]}\t{$buffer1[1]}\t{$ansnp}\t{$a3}
    ") === false) {
            trigger_error("Could not write result to output file.", E_USER_ERROR);
        }
    }
    
    // Release locks on and close all file handles.
    foreach (array($input_handle1, $input_handle2, $output_handle) as $delta => $handle) {
        if (flock($handle, LOCK_UN) === false) {
            trigger_error("Could not release lock!", E_USER_ERROR);
        }
        if (fclose($handle) === false) {
            trigger_error("Could not close file handle!", E_USER_ERROR);
        }
    }
    
    echo "Finished processing after " , (microtime(true) - $start) , PHP_EOL;
    

    Of course this could be done in OO fashion as well with exceptions etc.

    Line Buffering

    // Determines how many lines to buffer between each calculation/write.
    $lines_to_buffer = 1000;
    
    while (!feof($input_handle1) && !feof($input_handle2)) {
        $c1 = $c2 = 0;
    
        // Read lines from first handle, then read files from second handle.
        // NOTE: Reading multiple lines from the same file in a row allows us to make best use of the hard disk, if it isn't
        // an SSD, since we consecutively read from the same location which yields minimum seeks. But also keep in mind that
        // this might not be true if multiple processes are running in parallel, since they might read from different files
        // at the same time.
        foreach (array(1 => $input_handle1, 2 => $input_handle2) as $i => $handle) {
            while (($line = fgets($handle)) !== false) {
                ${"buffer{$i}"}[] = explode("\t", $line);
    
                // Break if we read enough lines.
                if (++${"c{$i}"} === $lines_to_buffer) {
                    break;
                }
            }
        }
    
        // Validate?
        if ($c1 !== $c2) {
            trigger_error("Lines from input files differ, aborting.", E_USER_ERROR);
        }
    
        for ($i = 0; $i < $lines_to_buffer; ++$i) {
            $a1 = $buffer1[$i][2] * $buffer1[$i][3];
            $a2 = $buffer2[$i][2] * $buffer2[$i][3];
            $ansnp = $buffer1[$i][2] - $buffer2[$i][2];
            $a3 = round(($a1 - $a2) / $ansnp, 3);
            $result .= "{$buffer1[0]}\t{$buffer1[1]}\t{$ansnp}\t{$a3}
    ";
        }
        fwrite($output_handle, $result);
    
        // Reset
        $result = $buffer1 = $buffer2 = null;
    }
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥50 silvaco GaN HEMT有栅极场板的击穿电压仿真问题
  • ¥15 谁会P4语言啊,我想请教一下
  • ¥20 win11无法启动 持续蓝屏且系统还原失败,无法开启系统保护
  • ¥15 哪个tomcat中startup一直一闪而过 找不出问题
  • ¥15 这个怎么改成直流激励源给加热电阻提供5a电流呀
  • ¥50 求解vmware的网络模式问题 别拿AI回答
  • ¥24 EFS加密后,在同一台电脑解密出错,证书界面找不到对应指纹的证书,未备份证书,求在原电脑解密的方法,可行即采纳
  • ¥15 springboot 3.0 实现Security 6.x版本集成
  • ¥15 PHP-8.1 镜像无法用dockerfile里的CMD命令启动 只能进入容器启动,如何解决?(操作系统-ubuntu)
  • ¥30 请帮我解决一下下面六个代码