Abstract | ||
---|---|---|
Recently, the end-to-end approach has proven its efficacy in monaural multi-speaker speech recognition. However, high word error rates (WERs) still prevent these systems from being used in practical applications. On the other hand, the spatial information in multi-channel signals has proven helpful in far-field speech recognition tasks. In this work, we propose a novel neural sequence-to-sequence (seq2seq) architecture, MIMO-Speech, which extends the original seq2seq to deal with multi-channel input and multi-channel output so that it can fully model multi-channel multi-speaker speech separation and recognition. MIMO-Speech is a fully neural end-to-end framework, which is optimized only via an ASR criterion. It is comprised of: 1) a monaural masking network, 2) a multi-source neural beamformer, and 3) a multi-output speech recognition model. With this processing, the input overlapped speech is directly mapped to text sequences. We further adopted a curriculum learning strategy, making the best use of the training set to improve the performance. The experiments on the spatialized wsj1-2mix corpus show that our model can achieve more than 60% WER reduction compared to the single-channel system with high quality enhanced signals (SI-SDR = 23.1 dB) obtained by the above separation function. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1109/ASRU46091.2019.9003986 | 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) |
Keywords | Field | DocType |
Overlapped speech recognition,end-to-end,neural beamforming,speech separation,curriculum learning | Training set,Spatial analysis,Masking (art),End-to-end principle,Computer science,MIMO,Multi channel,Speech recognition,Monaural | Conference |
ISBN | Citations | PageRank |
978-1-7281-0307-5 | 4 | 0.49 |
References | Authors | |
0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Xuankai Chang | 1 | 4 | 0.49 |
Wangyou Zhang | 2 | 12 | 5.44 |
Yanmin Qian | 3 | 295 | 44.44 |
Jonathan Le Roux | 4 | 839 | 68.14 |
Shinji Watanabe | 5 | 1158 | 139.38 |