A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation - Citegraph

Paper Info

Title
A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation

Abstract
Non-autoregressive (NAR) models simultaneously generate multiple outputs in a sequence, which significantly reduces the inference speed at the cost of accuracy drop compared to autoregressive baselines. Showing great potential for real-time applications, an increasing number of NAR models have been explored in different fields to mitigate the performance gap against AR models. In this work, we conduct a comparative study of various NAR modeling methods for end-to-end automatic speech recognition (ASR). Experiments are performed in the state-of-the-art setting using ESPnet. The results on various tasks provide interesting findings for developing an understanding of NAR ASR, such as the accuracy-speed trade-off and robustness against long-form utterances. We also show that the techniques can be combined for further improvement and applied to NAR end-to-end speech translation. All the implementations are publicly available to encourage further research in NAR speech processing.

Year	DOI	Venue
2021	10.1109/ASRU51503.2021.9688157	2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
Keywords	DocType	ISBN
Non-autoregressive sequence generation,end-to-end speech recognition,end-to-end speech translation	Conference	978-1-6654-3740-0
Citations	PageRank	References
1	0.35	0
Authors
9

Authors (9 rows)

Cited by (1 rows)

References (0 rows)

Name	Order	Citations	PageRank
Yosuke Higuchi	1	3	3.75
Nanxin Chen	2	64	7.55
Yuya Fujita	3	2	1.04
Hirofumi Inaguma	4	2	0.76
Komatsu Tatsuya	5	1	1.70
Jaesong Lee	6	1	0.69
Jumon Nozaki	7	1	0.35
Tianzi Wang	8	3	0.71
Shinji Watanabe	9	1158	139.38

1