duanqilupinf67040 2013-11-21 17:13
浏览 89
已采纳

测试正在刷新的磁盘缓存缓冲区

I currently have a video file that's being converted to a different format via a shell_exec() call. No problems with the call or with the format conversion, that all works correctly; but my next step is to push that file up to an s3 bucket.

However, I'd noticed that the filesystem caching won't necessarily flush my newly-written file immediately, so I was pushing a 0 byte file to the s3, even though whenever I looked at it on the filesystem it was the correct length. Inserting an arbitrary 5 second sleep in my code between the call to shell_exec and the s3-push solved this problem, but it feels very hacky, and I've no way of knowing whether 5 seconds sleep will always be enough especially when working with larger video files and the system is under load.

I'm pretty sure that I can't force a disk cache flush unless I execute a sync call (via shell_exec again), but I don't want to use that approach because it will affect all files on the server with any buffered data, not simply the single file that I'm manipulating.

So I wrote this simple bit of code to monitor the filesize until any disk cache flush is completed:

$prevSize = -1;
$size = filesize($myFileName);
while ($prevSize < $size) {
    sleep(1);
    clearstatcache(true, $myFileName);
    if ($size > 0)
        $prevSize = $size;
    $size = filesize($myFileName);
}

Basically, just looping until at least something has been flushed to the file, and filesize has been consistent for at least a second.

What I don't know is whether a disk flush will update the size only when all the file cache has been successfully flushed to disk; or whether it will flush a few blocks at a time, and I might find myself trying to push a partially flushed file to s3 and ending up with it being corrupted.

Any advice would be appreciated.

EDIT

The existing code looks something like:

private static function pushToS3($oldFilePath, $s3FileName, $newFilePath) {
    self::testFileFlush($newFilePath);
    file_put_contents(
        $s3FileName,
        file_get_contents($newFilePath)
    );
}

private function processVidoe($oldFilePath, $s3FileName, $newFilePath) {
    // Start Conversion
    $command = "ffmpeg -i \"$oldFilePath\" -y -ar 44100 \"$newFilePath\"";
    $processID = shell_exec("nohup ".$command." >/dev/null & echo $!");

    self::pushToS3($oldFilePath, $s3FileName, $newFilePath);
    unlink($newFilePath);
    unlink($oldFilePath);
}

This is a mod to old legacy code that ran on a single server, simply storing the files in the server's filesystem; but I've changed the infrastructure to run on multiple AWS EC2 app servers for resilience, and using S3 to provide sharing of file resources between the EC2s. Files are uploaded to the appservers by our users, then converted to flv and pushed to the S3 so that they're available to all EC2 instances.

The longer term solution is going to be using AWS Elastic Transcoder, when I can simply push the originals to S3 and submit a queued request to Elastic Transcoder, but that's a while away yet.

  • 写回答

2条回答 默认 最新

  • dsn1327 2013-11-21 17:58
    关注

    Unless you're doing one of the following, the behaviour you're describing should be impossible:

    1. Writing your data to a temp file, then copying/moving it to the location you're trying to upload.
    2. Mounting the same partition with two different machines, one writing the file and the other attempting to upload it.
    3. Some sort of hacky software buffering is happening.

    Otherwise the FS cache should be completely transparent to anything run on the OS, and any request for cached data that has not been written to the disk will be served from cache by the OS.

    In the case of #2 you should be able to get somewhat around it by changing the caching method to write-through instead of write-back. Your write performance goes down, but data is always written immediately and you're much less at risk of data loss.

    edit

    Ffmpeg is probably touching the filename you give it, using temp file[s] to convert the video, and then moving the finished file to the destination. I'm assuming that the script that fires off the conversion backgrounds the process since otherwise there wouldn't be any confusion as to if the completed file exists or not.

    What I would suggest is that instead of forking just ffmpeg into a background process and then testing if the end file exists fork into another PHP script in the background in which you call ffmpeg without backgrounding it, and then trigger the upload once that's complete.

    eg:

    //user-facing.php
    <?php
    echo "Queueing your file for processing..."
    shell_exec("/usr/bin/php /path/to/process.php /path/to/source.mpg /path/to/dest.mpg &")
    echo "Done!"
    

    and:

    //process.php
    <?php
    exec(sprintf("/path/to/ffmpeg -options %s %s", $argv[1], $argv[2]), $output, $exit_code);
    if($exit_code === 0) {
      upload_to_s3($argv[2]);
    } else {
      //notify someone of the error
    }
    

    This also lets you capture the output and return code from ffmpeg and act on it instead of wondering about why some videos just silently fail to convert.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 Python中的request,如何使用ssr节点,通过代理requests网页。本人在泰国,需要用大陆ip才能玩网页游戏,合法合规。
  • ¥100 为什么这个恒流源电路不能恒流?
  • ¥15 有偿求跨组件数据流路径图
  • ¥15 写一个方法checkPerson,入参实体类Person,出参布尔值
  • ¥15 我想咨询一下路面纹理三维点云数据处理的一些问题,上传的坐标文件里是怎么对无序点进行编号的,以及xy坐标在处理的时候是进行整体模型分片处理的吗
  • ¥15 CSAPPattacklab
  • ¥15 一直显示正在等待HID—ISP
  • ¥15 Python turtle 画图
  • ¥15 stm32开发clion时遇到的编译问题
  • ¥15 lna设计 源简并电感型共源放大器