I currently have a video file that's being converted to a different format via a shell_exec()
call. No problems with the call or with the format conversion, that all works correctly; but my next step is to push that file up to an s3 bucket.
However, I'd noticed that the filesystem caching won't necessarily flush my newly-written file immediately, so I was pushing a 0 byte file to the s3, even though whenever I looked at it on the filesystem it was the correct length. Inserting an arbitrary 5 second sleep in my code between the call to shell_exec and the s3-push solved this problem, but it feels very hacky, and I've no way of knowing whether 5 seconds sleep will always be enough especially when working with larger video files and the system is under load.
I'm pretty sure that I can't force a disk cache flush unless I execute a sync call (via shell_exec again), but I don't want to use that approach because it will affect all files on the server with any buffered data, not simply the single file that I'm manipulating.
So I wrote this simple bit of code to monitor the filesize until any disk cache flush is completed:
$prevSize = -1;
$size = filesize($myFileName);
while ($prevSize < $size) {
sleep(1);
clearstatcache(true, $myFileName);
if ($size > 0)
$prevSize = $size;
$size = filesize($myFileName);
}
Basically, just looping until at least something has been flushed to the file, and filesize has been consistent for at least a second.
What I don't know is whether a disk flush will update the size only when all the file cache has been successfully flushed to disk; or whether it will flush a few blocks at a time, and I might find myself trying to push a partially flushed file to s3 and ending up with it being corrupted.
Any advice would be appreciated.
EDIT
The existing code looks something like:
private static function pushToS3($oldFilePath, $s3FileName, $newFilePath) {
self::testFileFlush($newFilePath);
file_put_contents(
$s3FileName,
file_get_contents($newFilePath)
);
}
private function processVidoe($oldFilePath, $s3FileName, $newFilePath) {
// Start Conversion
$command = "ffmpeg -i \"$oldFilePath\" -y -ar 44100 \"$newFilePath\"";
$processID = shell_exec("nohup ".$command." >/dev/null & echo $!");
self::pushToS3($oldFilePath, $s3FileName, $newFilePath);
unlink($newFilePath);
unlink($oldFilePath);
}
This is a mod to old legacy code that ran on a single server, simply storing the files in the server's filesystem; but I've changed the infrastructure to run on multiple AWS EC2 app servers for resilience, and using S3 to provide sharing of file resources between the EC2s. Files are uploaded to the appservers by our users, then converted to flv and pushed to the S3 so that they're available to all EC2 instances.
The longer term solution is going to be using AWS Elastic Transcoder, when I can simply push the originals to S3 and submit a queued request to Elastic Transcoder, but that's a while away yet.