测试正在刷新的磁盘缓存缓冲区

I currently have a video file that's being converted to a different format via a shell_exec() call. No problems with the call or with the format conversion, that all works correctly; but my next step is to push that file up to an s3 bucket.

However, I'd noticed that the filesystem caching won't necessarily flush my newly-written file immediately, so I was pushing a 0 byte file to the s3, even though whenever I looked at it on the filesystem it was the correct length. Inserting an arbitrary 5 second sleep in my code between the call to shell_exec and the s3-push solved this problem, but it feels very hacky, and I've no way of knowing whether 5 seconds sleep will always be enough especially when working with larger video files and the system is under load.

I'm pretty sure that I can't force a disk cache flush unless I execute a sync call (via shell_exec again), but I don't want to use that approach because it will affect all files on the server with any buffered data, not simply the single file that I'm manipulating.

So I wrote this simple bit of code to monitor the filesize until any disk cache flush is completed:

$prevSize = -1;
$size = filesize($myFileName);
while ($prevSize < $size) {
    sleep(1);
    clearstatcache(true, $myFileName);
    if ($size > 0)
        $prevSize = $size;
    $size = filesize($myFileName);
}

Basically, just looping until at least something has been flushed to the file, and filesize has been consistent for at least a second.

What I don't know is whether a disk flush will update the size only when all the file cache has been successfully flushed to disk; or whether it will flush a few blocks at a time, and I might find myself trying to push a partially flushed file to s3 and ending up with it being corrupted.

Any advice would be appreciated.

EDIT

The existing code looks something like:

private static function pushToS3($oldFilePath, $s3FileName, $newFilePath) {
    self::testFileFlush($newFilePath);
    file_put_contents(
        $s3FileName,
        file_get_contents($newFilePath)
    );
}

private function processVidoe($oldFilePath, $s3FileName, $newFilePath) {
    // Start Conversion
    $command = "ffmpeg -i \"$oldFilePath\" -y -ar 44100 \"$newFilePath\"";
    $processID = shell_exec("nohup ".$command." >/dev/null & echo $!");

    self::pushToS3($oldFilePath, $s3FileName, $newFilePath);
    unlink($newFilePath);
    unlink($oldFilePath);
}

This is a mod to old legacy code that ran on a single server, simply storing the files in the server's filesystem; but I've changed the infrastructure to run on multiple AWS EC2 app servers for resilience, and using S3 to provide sharing of file resources between the EC2s. Files are uploaded to the appservers by our users, then converted to flv and pushed to the S3 so that they're available to all EC2 instances.

The longer term solution is going to be using AWS Elastic Transcoder, when I can simply push the originals to S3 and submit a queued request to Elastic Transcoder, but that's a while away yet.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dsn1327 2013-11-21 17:58
关注
Unless you're doing one of the following, the behaviour you're describing should be impossible:

Writing your data to a temp file, then copying/moving it to the location you're trying to upload.

Mounting the same partition with two different machines, one writing the file and the other attempting to upload it.

Some sort of hacky software buffering is happening.

Otherwise the FS cache should be completely transparent to anything run on the OS, and any request for cached data that has not been written to the disk will be served from cache by the OS.

In the case of #2 you should be able to get somewhat around it by changing the caching method to write-through instead of write-back. Your write performance goes down, but data is always written immediately and you're much less at risk of data loss.

edit

Ffmpeg is probably touching the filename you give it, using temp file[s] to convert the video, and then moving the finished file to the destination. I'm assuming that the script that fires off the conversion backgrounds the process since otherwise there wouldn't be any confusion as to if the completed file exists or not.

What I would suggest is that instead of forking just ffmpeg into a background process and then testing if the end file exists fork into another PHP script in the background in which you call ffmpeg without backgrounding it, and then trigger the upload once that's complete.

eg:

//user-facing.php <?php echo "Queueing your file for processing..." shell_exec("/usr/bin/php /path/to/process.php /path/to/source.mpg /path/to/dest.mpg &") echo "Done!"

and:

//process.php <?php exec(sprintf("/path/to/ffmpeg -options %s %s", $argv[1], $argv[2]), $output, $exit_code); if($exit_code === 0) { upload_to_s3($argv[2]); } else { //notify someone of the error }

This also lets you capture the output and return code from ffmpeg and act on it instead of wondering about why some videos just silently fail to convert.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(1条)

报告相同问题？

关注问题

测试正在刷新的磁盘缓存缓冲区 php
2013-11-21 17:13

回答 2 已采纳 Unless you're doing one of the following, the behaviour you're describing should be impossible:
缓冲区技术和SPOOLing技术有什么区别？
2018-08-31 04:00

回答 1 已采纳你应该是看《操作系统》这本书了。这本书大部分都已经与实际情况脱节，而且都是理论，对于指导现实问题帮助不大，但是这些东西都是基础学习起来很有帮助。想当年《操作系统》这门课考试考了97分，我就回答这个
Glide和Picasso如何获取磁盘缓存
2015-08-07 07:48

回答 5 已采纳 #Glide ``` Glide.with(mContext).load("").downloadOnly(Target.SIZE_ORIGINAL,Target.SIZE_ORIGINAL).g
php 缓冲区,PHP的输出缓冲区
2021-03-23 13:12

罗衣叠雪的博客 缓冲区的作用是把输入或者输出的内容先放进内存,而不显示或者读取，最本质的作用就是协调高速CPU和相对缓慢的IO设备(磁盘等)的运作。2、PHP在执行的时候,在什么地方有用到缓冲区?当执行PHP的时候,如果碰到了echo ...
vue-cli+springboot部署到linux中tomcat和nginx中时ajax请求返回200 ok（来自磁盘缓存） linux nginx vue.js
2022-03-07 11:01

回答 4 已采纳查找原因是nginx做正向代理配置有问题
在php中从本地磁盘读取文件 ajax javascript jquery php svn
2016-03-30 00:48

回答 2 已采纳 $file "is" the file resource; you don't want to print that but rather the return value of fread(),
有关Mysql 中的ChangeBuffer缓冲区的问题 mysql
2023-02-10 22:18

回答 3 已采纳该回答引用ChatGPT在MySQL中，当一个数据页不在Buffer Pool中时，更新操作会被记录在Change Buffer中，而不是去磁盘查询数据。Change Buffer的作用是在后台合并这
php 缓冲区
2018-04-02 15:13

jason1993as的博客至于为什么要有缓冲区,这是一个很广泛的问题,其实缓冲区最本质的作用就是,协调高速CPU和相对缓慢的IO设备(磁盘等)的运作。想要了解PHP的缓冲区,就要知道执行PHP的时候,缓冲区被设置到了什么地方。当执行PHP的时候,...
使用PHP或Javascript重命名本地磁盘上的文件 php
2018-08-16 16:45

回答 2 已采纳 You could do this, presuming downloading the file is ok. <html> <head> <sc
如何从PHP访问第二个磁盘驱动器？ apache linux php
2017-01-11 23:44

回答 1 已采纳 I believe that your *nix file system doesn't care how many actual hard drives you have. All of the
PHP错误 - “驱动器中没有磁盘” php windows
2015-02-08 05:02

回答 4 已采纳 OK, I found it is an unresolved bug in PHP https://bugs.php.net/bug.php?id=68312 and some people j
php缓冲区详解
2016-12-07 22:57

中关村象哥的博客其实缓冲区最本质的作用就是,协调高速CPU和相对缓慢的IO设备(磁盘等)的运作. PHP在执行的时候,在什么地方有用到缓冲区? 想要了解PHP的缓冲区,就要知道执行PHP的时候,缓冲区被设置到了什么地方.
由Memcached处理的PHP会话仍在磁盘上 memcached nginx php
2016-02-22 15:07

回答 1 已采纳 The session files were created by php-cli started by cron. cli config differs from fpm one and use
php 缓冲区总结
2017-04-18 12:01

Gy__My的博客如果开启了输出缓冲区,当PHP程序读完文件的某一段,然后马上输出到apache,然后让apache马上返回到浏览器,这样就可以减少用户等待时间.
PHP OB-缓冲区
2017-09-12 17:57

一人之下丶的博客其实缓冲区最本质的作用就是,协调高速CPU和相对缓慢的IO设备(磁盘等)的运作. PHP在执行的时候,在什么地方有用到缓冲区? 想要了解PHP的缓冲区,就要知道执行PHP的时候,缓冲区被设置到了什么地方.
没有解决我的问题, 去提问

悬赏问题

¥15 Python中的request，如何使用ssr节点，通过代理requests网页。本人在泰国，需要用大陆ip才能玩网页游戏，合法合规。
¥100 为什么这个恒流源电路不能恒流？
¥15 有偿求跨组件数据流路径图
¥15 写一个方法checkPerson，入参实体类Person，出参布尔值
¥15 我想咨询一下路面纹理三维点云数据处理的一些问题，上传的坐标文件里是怎么对无序点进行编号的，以及xy坐标在处理的时候是进行整体模型分片处理的吗
¥15 CSAPPattacklab
¥15 一直显示正在等待HID—ISP
¥15 Python turtle 画图
¥15 stm32开发clion时遇到的编译问题
¥15 lna设计源简并电感型共源放大器

测试正在刷新的磁盘缓存缓冲区

2条回答 默认 最新

edit

悬赏问题

2条回答默认最新