如何将来自S3的wav的前5秒流式传输或发送到IBM watson进行语音到文本转换？

I am trying to convert .wav files into text so that we can do speech analysis for the calls answered by my company. I got a working prototype up but its slow and takes a long time to just transcribe 100 files. I need to be able to do about 30k files a day.

Here is my code so far. Its in steps ie a user has to start one after the other.

First Step get the file from S3 server

S3.php
<?php    
require 'aws-autoloader.php';
include 'db.php'; // my DB connection file
set_time_limit(0);
ignore_user_abort(true);
ini_set('max_execution_time', 0);  // setting timer to  so the script does not time out
$credentials = new Aws\Credentials\Credentials('', '');

$s3Client = new Aws\S3\S3Client([   // S3 client connection
    'version' => 'latest',
    'region' => 'us-east-1',
    'credentials' => $credentials
    //'debug' => true
]); 

echo "<br>";

$objects = $s3Client->getIterator('ListObjects', array(
    "Bucket" => 'bucker_name',
    "Prefix" => "folder1/folder2/folder3/2017/10/05/"
));

$i = 0;
foreach ($objects as $object) {
    try {
        if ($i == 140) break;  // This is the counter I set to get only 140 files 
        if ($object['Size'] > 482000 and $object['Size'] < 2750000) { // get only objects that are not too small nor too big or consider it file filerting
            echo $object['Key'] . "<br>";    
            $i++;
            $cmd = $s3Client->getCommand('GetObject', [
                'Bucket' => 'bucket_name',
                'Key' => $object['Key']
            ]);

            // Create a signed URL from a completely custom HTTP request that
            // will last for 10 minutes from the current time
            $signedUrl = $s3Client->createPresignedRequest($cmd, '+10 minutes');
            ob_start();
            echo $url = (string)$signedUrl->getUri();
            ob_end_flush();
            ob_flush();
            flush();
            $filename = parse_url($url, PHP_URL_PATH);
            $arr = explode("_", basename($filename));
            $filename = $arr[0] . ".wav";
            file_put_contents('uploads/' . basename($filename), fopen($url, 'r'));   // Storing the files in uploads folder on my Linux server
            $sql = "INSERT INTO `audioFiles` (`audioFile`) VALUES ('" . basename($filename) . "')"; // Inserting the file name into DB to keep track of it

            $STH = $DBH->prepare($sql);
            $STH->execute();
        }
        //print_r($object);
    } catch (Exception $e) {
        print_r($e);
    }

}

Once the files are downloaded I need to split the recording into left and right and use the first 5 seconds of the right side. I am doing this cause its expensive to transcribe the entire call and this is more of a initializing app that needs to scale for thousands of files before we can rationalize doing it for the entire duration of each file.

Here is part of the script used to split and extract the first 5 seconds. I get the files names from the DB which have a marker as 0 and to the split and then update the DB file marker with the new name and marker as 1.

Split.php
$sql = "SELECT audioFile FROM audioFiles WHERE split = 0";    // SQL to get file names
$sql_update = "UPDATE audioFiles SET split = 1 WHERE audioFile IN ("; // SQL to update split files
.
.
while ($fileName = $STH->fetch()) {
echo $output = shell_exec("sox --i " . $location . " | grep Channels | sed 's/^.*: //'");  // to check if the file has stereo or mono recording 
    if ($output == 2) {
        $left = substr($location, 0, $extension_pos) . '.CALLER' . substr($location, $extension_pos);
        $right = substr($location, 0, $extension_pos) . '.AGENT' . substr($location, $extension_pos);
        $ap = substr($location, 0, $extension_pos) . '.AGENT.AP' . substr($location, $extension_pos);
        exec("sox $location $left remix 1 ");
        exec("sox $location $right remix 2 ");
        exec("sox $location $ap trim 0 5");
        $sql_update .= "'" . $fileName[0] . "',";
        $sql_update_agentTranscript = "UPDATE audioFiles SET agentFile ='" . $right . "', agentAP ='".$ap ."' WHERE audioFile ='" . $fileName[0] . "'";
        $STH1 = $DBH->prepare($sql_update_agentTranscript);
        $STH1->execute();
    } else if ($output == 1) {
        $right = substr($location, 0, $extension_pos) . '.AGENT' . substr($location, $extension_pos);
        $ap = substr($location, 0, $extension_pos) . '.AGENT.AP' . substr($location, $extension_pos);
        exec("cp $location $right");
        exec("sox $location $ap trim 0 5");
        $sql_update .= "'" . $fileName[0] . "',";
        $sql_update_agentTranscript = "UPDATE audioFiles SET agentFile ='" . $right . "', agentAP ='".$ap ."' WHERE audioFile ='" . $fileName[0] . "'";
        $STH1 = $DBH->prepare($sql_update_agentTranscript);
        $STH1->execute();
    } else {
        echo "Something is wrong. The file did not have 1 or 2 channel or code is wrong - ".$fileName[0];
        echo "<br>";
        $ap = substr($location, 0, $extension_pos) . '.AGENT.AP' . substr($location, $extension_pos);
    }
$sql_update = substr($sql_update, 0, -1);
$sql_update .= ")";error_log($sql_update, 0);
$STH = $DBH->prepare($sql_update);
$STH->execute();

Here is the script use to convert the 5 second files into text.

IBM.php
    <?php
    .
    //get file name from DB with marker set as 1 from previous script.
    $url = 'https://stream.watsonplatform.net/speech-to-text/api/v1/recognize?model=en-US_NarrowbandModel&profanity_filter=false';
    $headers = array(
    "Content-Type: audio/wav",
    "Transfer-Encoding: chunked");
    .
    if($STH->rowCount() > 0) {
        while ($fileName = $STH->fetch()) {
            $file = fopen($fileName[0], 'r');
            $size = filesize($fileName[0]);
            $fileData = fread($file, $size);
            // CURL start to send via IBM API and conver it.
            $ch = curl_init();
            curl_setopt($ch, CURLOPT_URL, $url);
            curl_setopt($ch, CURLOPT_USERPWD, "$username:$password");
            curl_setopt($ch, CURLOPT_POST, TRUE);
            curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
            curl_setopt($ch, CURLOPT_BINARYTRANSFER, TRUE);
            curl_setopt($ch, CURLOPT_POSTFIELDS, $fileData);
            curl_setopt($ch, CURLOPT_INFILE, $file);
            curl_setopt($ch, CURLOPT_INFILESIZE, $size);
            curl_setopt($ch, CURLOPT_VERBOSE, true);
            curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
            curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
            $executed = curl_exec($ch);
            curl_close($ch);
            $result = json_decode($executed);

            $match = "thank you for calling";  // The text to see if they are present in the converted text

                $transcript = $result->results[0]->alternatives[0]->transcript;
                if(strpos($transcript,$match) !== false){

                //Update DB with STH1->execute() to say that matching text is found.

                } else {

                //Update DB with STH2->execute() to say that matching text is not found.
                }
        }
    }
    else{
        echo "No more files to convert.";
    }
?>

The above can be used to convert speech to text using IBM Watson. Just adding it if any one wants to use it.

The whole three step process, what I assume, would work for hundreds of calls but will not work or be too expensive to run for thousands of calls.

The steps can be listed as follows.

Download files from S3 to server // Extremely slow
Split files // Reasonably fast based on server power
Transcribe with IBM Watson // moderate, I can not think of a way to speed this up, unless I figure out how to do huge batch conversion.

I need help optimizing this flow and making it faster than it is now. I was hoping there will be a way to send the file from S3 to IBM Watson directly as a stream with a 5 second time limit for each file. I think this might be possible but I don't have the slightest idea how to do that.

Do I need to recreate it completely? if so what other option is there?

Any suggestions or ideas will help.

Ps - I apologize for my code indenting

展开全部