Abstract | ||
---|---|---|
Is pushing numbers on a single benchmark valuable in automatic speech recognition? Research results in acoustic modeling are typically evaluated based on performance on a single dataset. While the research community has coalesced around various benchmarks, we set out to understand generalization performance in acoustic modeling across datasets -- in particular, if models trained on a single dataset transfer to other (possibly out-of-domain) datasets. Further, we demonstrate that when a large enough set of benchmarks is used, average word error rate (WER) performance over them provides a good proxy for performance on real-world data. Finally, we show that training a single acoustic model on the most widely-used datasets -- combined -- reaches competitive performance on both research and real-world benchmarks. |
Year | DOI | Venue |
---|---|---|
2021 | 10.21437/Interspeech.2021-1758 | Interspeech |
DocType | Citations | PageRank |
Conference | 2 | 0.38 |
References | Authors | |
0 | 8 |
Name | Order | Citations | PageRank |
---|---|---|---|
Tatiana Likhomanenko | 1 | 24 | 5.47 |
Qiantong Xu | 2 | 34 | 7.42 |
Vineel Pratap | 3 | 16 | 2.69 |
Paden Tomasello | 4 | 3 | 1.42 |
Jacob Kahn | 5 | 20 | 2.38 |
Gilad Avidov | 6 | 2 | 0.38 |
Ronan Collobert | 7 | 4002 | 308.61 |
Gabriel Synnaeve | 8 | 27 | 7.73 |