Abstract | ||
---|---|---|
This paper summarizes our acoustic modeling efforts in the Johns Hopkins University speech recognition system for the CHiME-5 challenge to recognize highly-overlapped dinner party speech recorded by multiple microphone arrays. We explore data augmentation approaches, neural network architectures, front-end speech dereverberation, beamforming and robust i-vector extraction with comparisons of our in-house implementations and publicly available tools. We finally achieved a word error rate of 69.4% on the development set, which is a 11.7% absolute improvement over the previous baseline of 81.1%, and release this improved baseline with refined techniques/ tools as an advanced CHiME-5 recipe. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1109/icassp.2019.8682556 | 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) |
Keywords | Field | DocType |
Robust speech recognition, acoustic modeling, Kaldi, CHiME-5 challenge | Beamforming,Computer science,Word error rate,Implementation,Speech recognition,Hidden Markov model,Artificial neural network,Microphone | Conference |
ISSN | Citations | PageRank |
1520-6149 | 0 | 0.34 |
References | Authors | |
0 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Vimal Manohar | 1 | 54 | 7.99 |
Szu-Jui Chen | 2 | 0 | 0.34 |
Zhiqi Wang | 3 | 13 | 3.94 |
Y. Fujita | 4 | 26 | 9.17 |
Shinji Watanabe | 5 | 1158 | 139.38 |
Sanjeev Khudanpur | 6 | 2155 | 202.00 |