Ok that's good to know. Does that mean that during a continousRecognition the speech service may change the timeouts during runtime? I.e. If InitialSilenceTimeout is set to be 30s, will the speech change the parameter or ignore it ? Can the timeouts be set to be several minutes long ?
Minimum/Maximum values for InitialSilence and EndSilence timeouts for java SDK
In the Java SDK there are 2 PropertyID fields, SpeechServiceConnection_EndSilenceTimeoutMs SpeechServiceConnection_InitialSilenceTimeoutMs Do these work in 1.9.0, or is it still the case that these property ids don't affect SDKS ? https://github.com/Azure-Samples/cognitive-services-speech-sdk/issues/131
If they are available to the SDK, what are the minimum and maximum values allowed ? 0/1 doesn't appear to affect initialsilence timeout.
Also, is the BabbleTimeout, mentioned here and there in other api documentation, available to the Java SDK?
- 点赞 评论 复制链接分享
I've been testing the InitialSilenceTimeout parameter. What is unit are these timeouts in ? 5000 -> 2-3 second silence timeout 10000 -> 5 second silence timeout 20000 -> 10 second silence timeout
Above 20000 it's still 10 seconds. Is that the maximum for Interactive mode ? I 've also noticed that above 20000 we get a "recognized" call after the 10 second mark, and then no "speechEndDetected". Whereas the 20000 timeout sends "speechEndDetected".点赞 评论 复制链接分享
There's multiple parameters that control how the Speech Service handles silence in various situations. It's fairly complicated how everything intertwines, and as Wei mentioned above, messing with the parameters often has unexpected behavior.
Not all of the parameters are exposed through the SDK.
The four most critical are: 1) Segmentation Timeout: How long the silence after speech is to cause a phrase to be recognized as complete. Not exposed through the SDK.
2) Maximum Segment length: The maximum length a single phrase can be regardless of whether the speaker has paused for the Segmentation Timeout or not. This also controls how long the interval between recognized silence will be for continuous recognition. Not exposed through the SDK.
3) Initial Silence Timeout: The amount of silence at the beginning of a recognition before the Speech Service stops recognizing speech.
4) End Silence Timeout: The amount of silence after speech before the Speech Service stops recognizing speech.
The default values for all 4 vary based on the type of recognition being done.
For RecognizeOnceAsync one and only one phrase will be recognized, if that phrase is empty because it hit the max segmentation time and only had silence, that's the only result that will be returned.
For StartContinuousRecognition, the Speech SDK will take steps to keep recognition going in the event the Speech Service wants to stop recognition because of hitting either the InitialSilenceTimeout or the EndSilenceTimeout, but will raise events to indicate that those limits were hit.
As an example, if I set:
and start recognition from a microphone in continuous mode and never speak, I'll get the following events: 2/13/2020 1:31:42 PM Session ID: da2617f4af064328841d9816f5038e48 2/13/2020 1:32:04 PM Recognized: () Offset 1900000 Duration 200700000 RecognizedSpeech 2/13/2020 1:32:24 PM Recognized: () Offset 215100000 Duration 187500000 RecognizedSpeech 2/13/2020 1:32:28 PM Speech Ended 453000000 2/13/2020 1:32:49 PM Recognized: () Offset 472000000 Duration 182700000 RecognizedSpeech 2/13/2020 1:33:09 PM Recognized: () Offset 666700000 Duration 188000000 RecognizedSpeech 2/13/2020 1:33:14 PM Speech Ended 907400000点赞 评论 复制链接分享
Thanks, that's a much clearer picture of how the recognition works. Due to the documentation, I had assumed that Initial Silence timeout was setting the amount of time before a Recognized call was made with blank text. It's really helpful to have the expected log output of which event handlers are triggered. It would be useful to have more examples of the expected event output from a continousRecognition call.
In your example log, why are two recognized calls made before the SpeechEnd is triggered? Is the hidden value of Segmentation Timeout used to mark these two as silent phrases?
2/13/2020 1:31:42 PM Session ID: da2617f4af064328841d9816f5038e48 2/13/2020 1:32:04 PM Recognized: () Offset 1900000 Duration 200700000 RecognizedSpeech 2/13/2020 1:32:24 PM Recognized: () Offset 215100000 Duration 187500000 RecognizedSpeech 2/13/2020 1:32:28 PM Speech Ended 453000000点赞 评论 复制链接分享
Thanks for the feedback the examples were helpful. I’ll give some thought around where and how to surface that information more.
You’re seeing the max segment length drive the empty recognized calls.点赞 评论 复制链接分享
I'm going to close this as the question has been answered, Please re-open if I'm wrong.点赞 评论 复制链接分享
The SpeechServiceConnection_EndSilenceTimeoutMs and SpeechServiceConnection_InitialSilenceTimeoutMs are added to Speeck SDK around 4/2019 in 1.5.0. They are supported in java, C# and C++, etc. Yes, 0ms and 1ms does not have noticeable change in behavior. Here are some rough guideline: InitialSilenceTimeout: default is 5s for interactive mode, 15s for conversation mode. EndSIlenceTimeout: default 5s for interactive mode, 20s for conversation mode. Cognitive Speech Service may change these values. These are advance parameters that can affect the behaviors of decoder significantly. Please play with them with care!
Speech SDK does not have a BabbleTimeout as a parameter. It is a no match reason as it comes back from the service. InitialBabbleTimeout is a noMatchReason in Java as well.
Spe点赞 评论 复制链接分享
- mysql链接超时 is longer than the server configured value of 'wait_timeout'
- Golang net / http上传大文件，未定义错误
- Go lang中html / template的性能降低，有什么解决方法吗？
- 优化HTML /模板组成
- PHP用户身份验证和安全方法：LDAP-AD和SQL Server
- PHPExcel for循环超时
- too long
- 在nginx / php-fpm配置上同时运行脚本的限制是什么？
- WebDriver driver = new ChromeDriver();报错
- DefaultHttpClient 中的 Timeout事件