weixin_39833687
weixin_39833687
2020-12-27 22:42

Minimum/Maximum values for InitialSilence and EndSilence timeouts for java SDK

In the Java SDK there are 2 PropertyID fields, SpeechServiceConnection_EndSilenceTimeoutMs SpeechServiceConnection_InitialSilenceTimeoutMs Do these work in 1.9.0, or is it still the case that these property ids don't affect SDKS ? https://github.com/Azure-Samples/cognitive-services-speech-sdk/issues/131

If they are available to the SDK, what are the minimum and maximum values allowed ? 0/1 doesn't appear to affect initialsilence timeout.

Also, is the BabbleTimeout, mentioned here and there in other api documentation, available to the Java SDK?

该提问来源于开源项目:Azure-Samples/cognitive-services-speech-sdk

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

7条回答

  • weixin_39833687 weixin_39833687 3月前

    Ok that's good to know. Does that mean that during a continousRecognition the speech service may change the timeouts during runtime? I.e. If InitialSilenceTimeout is set to be 30s, will the speech change the parameter or ignore it ? Can the timeouts be set to be several minutes long ?

    点赞 评论 复制链接分享
  • weixin_39833687 weixin_39833687 3月前

    I've been testing the InitialSilenceTimeout parameter. What is unit are these timeouts in ? 5000 -> 2-3 second silence timeout 10000 -> 5 second silence timeout 20000 -> 10 second silence timeout

    Above 20000 it's still 10 seconds. Is that the maximum for Interactive mode ? I 've also noticed that above 20000 we get a "recognized" call after the 10 second mark, and then no "speechEndDetected". Whereas the 20000 timeout sends "speechEndDetected".

    点赞 评论 复制链接分享
  • weixin_39527372 weixin_39527372 3月前

    There's multiple parameters that control how the Speech Service handles silence in various situations. It's fairly complicated how everything intertwines, and as Wei mentioned above, messing with the parameters often has unexpected behavior.

    Not all of the parameters are exposed through the SDK.

    The four most critical are: 1) Segmentation Timeout: How long the silence after speech is to cause a phrase to be recognized as complete. Not exposed through the SDK.

    2) Maximum Segment length: The maximum length a single phrase can be regardless of whether the speaker has paused for the Segmentation Timeout or not. This also controls how long the interval between recognized silence will be for continuous recognition. Not exposed through the SDK.

    3) Initial Silence Timeout: The amount of silence at the beginning of a recognition before the Speech Service stops recognizing speech.

    4) End Silence Timeout: The amount of silence after speech before the Speech Service stops recognizing speech.

    The default values for all 4 vary based on the type of recognition being done.

    For RecognizeOnceAsync one and only one phrase will be recognized, if that phrase is empty because it hit the max segmentation time and only had silence, that's the only result that will be returned.

    For StartContinuousRecognition, the Speech SDK will take steps to keep recognition going in the event the Speech Service wants to stop recognition because of hitting either the InitialSilenceTimeout or the EndSilenceTimeout, but will raise events to indicate that those limits were hit.

    As an example, if I set: speechConfig.SetProperty(PropertyId.SpeechServiceConnection_InitialSilenceTimeoutMs, "45000");

    and start recognition from a microphone in continuous mode and never speak, I'll get the following events: 2/13/2020 1:31:42 PM Session ID: da2617f4af064328841d9816f5038e48 2/13/2020 1:32:04 PM Recognized: () Offset 1900000 Duration 200700000 RecognizedSpeech 2/13/2020 1:32:24 PM Recognized: () Offset 215100000 Duration 187500000 RecognizedSpeech 2/13/2020 1:32:28 PM Speech Ended 453000000 2/13/2020 1:32:49 PM Recognized: () Offset 472000000 Duration 182700000 RecognizedSpeech 2/13/2020 1:33:09 PM Recognized: () Offset 666700000 Duration 188000000 RecognizedSpeech 2/13/2020 1:33:14 PM Speech Ended 907400000

    点赞 评论 复制链接分享
  • weixin_39833687 weixin_39833687 3月前

    Thanks, that's a much clearer picture of how the recognition works. Due to the documentation, I had assumed that Initial Silence timeout was setting the amount of time before a Recognized call was made with blank text. It's really helpful to have the expected log output of which event handlers are triggered. It would be useful to have more examples of the expected event output from a continousRecognition call.

    In your example log, why are two recognized calls made before the SpeechEnd is triggered? Is the hidden value of Segmentation Timeout used to mark these two as silent phrases?

    2/13/2020 1:31:42 PM Session ID: da2617f4af064328841d9816f5038e48 2/13/2020 1:32:04 PM Recognized: () Offset 1900000 Duration 200700000 RecognizedSpeech 2/13/2020 1:32:24 PM Recognized: () Offset 215100000 Duration 187500000 RecognizedSpeech 2/13/2020 1:32:28 PM Speech Ended 453000000

    点赞 评论 复制链接分享
  • weixin_39527372 weixin_39527372 3月前

    Thanks for the feedback the examples were helpful. I’ll give some thought around where and how to surface that information more.

    You’re seeing the max segment length drive the empty recognized calls.

    点赞 评论 复制链接分享
  • weixin_39527372 weixin_39527372 3月前

    I'm going to close this as the question has been answered, Please re-open if I'm wrong.

    点赞 评论 复制链接分享
  • weixin_39552874 weixin_39552874 3月前

    The SpeechServiceConnection_EndSilenceTimeoutMs and SpeechServiceConnection_InitialSilenceTimeoutMs are added to Speeck SDK around 4/2019 in 1.5.0. They are supported in java, C# and C++, etc. Yes, 0ms and 1ms does not have noticeable change in behavior. Here are some rough guideline: InitialSilenceTimeout: default is 5s for interactive mode, 15s for conversation mode. EndSIlenceTimeout: default 5s for interactive mode, 20s for conversation mode. Cognitive Speech Service may change these values. These are advance parameters that can affect the behaviors of decoder significantly. Please play with them with care!

    Speech SDK does not have a BabbleTimeout as a parameter. It is a no match reason as it comes back from the service. InitialBabbleTimeout is a noMatchReason in Java as well.

    Spe

    点赞 评论 复制链接分享