Title
Rethinking Evaluation in ASR - Are Our Models Robust Enough?
Abstract
Is pushing numbers on a single benchmark valuable in automatic speech recognition? Research results in acoustic modeling are typically evaluated based on performance on a single dataset. While the research community has coalesced around various benchmarks, we set out to understand generalization performance in acoustic modeling across datasets -- in particular, if models trained on a single dataset transfer to other (possibly out-of-domain) datasets. Further, we demonstrate that when a large enough set of benchmarks is used, average word error rate (WER) performance over them provides a good proxy for performance on real-world data. Finally, we show that training a single acoustic model on the most widely-used datasets -- combined -- reaches competitive performance on both research and real-world benchmarks.
Year
DOI
Venue
2021
10.21437/Interspeech.2021-1758
Interspeech
DocType
Citations 
PageRank 
Conference
2
0.38
References 
Authors
0
8
Name
Order
Citations
PageRank
Tatiana Likhomanenko1245.47
Qiantong Xu2347.42
Vineel Pratap3162.69
Paden Tomasello431.42
Jacob Kahn5202.38
Gilad Avidov620.38
Ronan Collobert74002308.61
Gabriel Synnaeve8277.73