Abstract | ||
---|---|---|
Far-field multi-speaker automatic speech recognition (ASR) has drawn increasing attention in recent years. Most existing methods feature a signal processing frontend and an ASR backend. In realistic scenarios, these modules are usually trained separately or progressively, which suffers from either inter-module mismatch or a complicated training process. In this paper, we propose an end-to-end multi-channel model that jointly optimizes the speech enhancement (including speech dereverberation, denoising, and separation) frontend and the ASR backend as a single system. To the best of our knowledge, this is the first work that proposes to optimize dereverberation, beamforming, and multi-speaker ASR in a fully end-to-end manner. The frontend module consists of a weighted prediction error (WPE) based submodule for dereverberation and a neural beamformer for denoising and speech separation. For the backend, we adopt a widely used end-to-end (E2E) ASR architecture. It is worth noting that the entire model is differentiable and can be optimized in a fully end-to-end manner using only the ASR criterion, without the need of parallel signal-level labels. We evaluate the proposed model on several multi-speaker benchmark datasets, and experimental results show that the fully E2E ASR model can achieve competitive performance on both noisy and reverberant conditions, with over 30% relative word error rate (WER) reduction over the single-channel baseline systems. |
Year | DOI | Venue |
---|---|---|
2022 | 10.1109/TASLP.2022.3209942 | IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING |
Keywords | DocType | Volume |
Training, Speech recognition, Array signal processing, Speech enhancement, Reverberation, Noise reduction, Feature extraction, End-to-end, dereverberation, beamforming, speech separation, multi-talker speech recognition | Journal | 30 |
ISSN | Citations | PageRank |
2329-9290 | 0 | 0.34 |
References | Authors | |
0 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Wangyou Zhang | 1 | 12 | 5.44 |
Xuankai Chang | 2 | 2 | 0.70 |
Boeddeker Christoph | 3 | 3 | 3.84 |
Tomohiro Nakatani | 4 | 1327 | 139.18 |
Shinji Watanabe | 5 | 1158 | 139.38 |
Yanmin Qian | 6 | 295 | 44.44 |