End-to-End Dereverberation, Beamforming, and Speech Recognition in a Cocktail Party - Citegraph

Paper Info

Title
End-to-End Dereverberation, Beamforming, and Speech Recognition in a Cocktail Party

Abstract
Far-field multi-speaker automatic speech recognition (ASR) has drawn increasing attention in recent years. Most existing methods feature a signal processing frontend and an ASR backend. In realistic scenarios, these modules are usually trained separately or progressively, which suffers from either inter-module mismatch or a complicated training process. In this paper, we propose an end-to-end multi-channel model that jointly optimizes the speech enhancement (including speech dereverberation, denoising, and separation) frontend and the ASR backend as a single system. To the best of our knowledge, this is the first work that proposes to optimize dereverberation, beamforming, and multi-speaker ASR in a fully end-to-end manner. The frontend module consists of a weighted prediction error (WPE) based submodule for dereverberation and a neural beamformer for denoising and speech separation. For the backend, we adopt a widely used end-to-end (E2E) ASR architecture. It is worth noting that the entire model is differentiable and can be optimized in a fully end-to-end manner using only the ASR criterion, without the need of parallel signal-level labels. We evaluate the proposed model on several multi-speaker benchmark datasets, and experimental results show that the fully E2E ASR model can achieve competitive performance on both noisy and reverberant conditions, with over 30% relative word error rate (WER) reduction over the single-channel baseline systems.

Year	DOI	Venue
2022	10.1109/TASLP.2022.3209942	IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING
Keywords	DocType	Volume
Training, Speech recognition, Array signal processing, Speech enhancement, Reverberation, Noise reduction, Feature extraction, End-to-end, dereverberation, beamforming, speech separation, multi-talker speech recognition	Journal	30
ISSN	Citations	PageRank
2329-9290	0	0.34
References	Authors
0	6

Authors (6 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Wangyou Zhang	1	12	5.44
Xuankai Chang	2	2	0.70
Boeddeker Christoph	3	3	3.84
Tomohiro Nakatani	4	1327	139.18
Shinji Watanabe	5	1158	139.38
Yanmin Qian	6	295	44.44

1