Year, 's solution does not solve the GPU efficiency issue, but it is useful to extend it without changing our API.
What is your purpose?
Our current implementation corresponds to the S=1 result at Table 6 in https://www.isca-speech.org/archive/Interspeech_2019/pdfs/2860.pdf
Yes, if we use utterance batch, we could obtain around two times faster decoding but the normal test set is much smaller than the training data and we also don't need multiple epochs.
So, I don't have so many issues with the current implementation.
If the purpose is to perform decoding during training (e.g., MBR), it would be a problem.
Or if your application is to automatically transcribe a huge amount of speech data, it would have great benefits of using utterance batch.
We also have such use cases and I can increase the priority of the development of utterance batch processing depending on your request.
Thanks. To be honest, my application might be more engineering as there will be multiple audio utterances need to be processed in the same time and it's a time sensitive situation. Using torch.multiprocessing may not be a good idea because GPU utilization has reached its bottleneck while doing utterance batching can further improve the decoding speed (I guess).
I think really provided a great situation that utt batch decoding can help a lot and such situation is more common in research purpose.
Thank you for all your helpful discussions and I will close this issue at this moment.