weixin_39826089
weixin_39826089
2020-12-27 22:32

Multi-utterance extension of speech_recognizer.recognize_once() API

I'm using the speech_recognizer.recognize_once() method for synchronous/blocking transcription of audio files. However, this method only does recognition of a single utterance, but the files I wish to transcribe can contain multiple utterances.

What is the recommended approach for synchronous/blocking multi-utterance transcription of audio files? Would it be possible to extend the speech_recognizer.recognize_once() API to also accept multi-utterance audio files?

该提问来源于开源项目:Azure-Samples/cognitive-services-speech-sdk

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

7条回答

  • weixin_39813263 weixin_39813263 4月前

    Please have a look at the continuous recognition mode. There is a sample here. This comment shows how to accumulate all results from continuous recognition.

    点赞 评论 复制链接分享
  • weixin_39826089 weixin_39826089 4月前

    Please have a look at the continuous recognition mode. There is a sample here. This comment shows how to accumulate all results from continuous recognition.

    Thanks for the information, the sample code seems to work fine with multiple utterances. However in this continuous recognition mode, I can't seem to delete the audio file immediately after the recognition is complete. Is there something I have to do in order to close the audio file used for recognition?

    点赞 评论 复制链接分享
  • weixin_39813263 weixin_39813263 4月前

    You could try calling del recognizer after the recognition is finished (i.e., after you have received a session stopped event) to clean up the recognizer resources. Also have a look at the Batch API (sample), maybe it fits for your needs.

    点赞 评论 复制链接分享
  • weixin_39826089 weixin_39826089 4月前

    I have tried deleting both speech_recognizer and others after recognition is completed, but it doesn't make any difference. Could it be that the file access is somehow not correctly released in the SDK after use when using speech_recognizer.start_continuous_recognition()?

    点赞 评论 复制链接分享
  • weixin_39813263 weixin_39813263 4月前

    Hi , could you share the code that shows the problem you are having? Also, could you check the discussion in #352 to see whether it might be related? Thanks!

    点赞 评论 复制链接分享
  • weixin_39826089 weixin_39826089 4月前

    Hi , could you share the code that shows the problem you are having? Also, could you check the discussion in #352 to see whether it might be related? Thanks!

    Thanks for the input. The issue identified in #352 indeed seems to be the same as I'm having: If I do del speech_recognizer._impl, the clean-up correctly does its job and I can delete the audio file afterwards without problems. Maybe you should update the sample code and/or SDK to not have this problem?

    点赞 评论 复制链接分享
  • weixin_39813263 weixin_39813263 4月前

    Good to know that this is the underlying issue. You should then also be able to solve the problem by calling recognizer.canceled.disconnect_all() (etc. for the other signals) after recognition has finished, or moving the call to stop_continuous_recognition out of the callback. We do indeed have work items in the backlog to make this easier and clearer, but no ETA yet.

    I'll proceed to close the issue, please reopen if you continue to have problems. Thanks!

    点赞 评论 复制链接分享

相关推荐