Title
ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding
Abstract
This paper presents recent progress on integrating speech separation and enhancement (SSE) into the ESPnet toolkit. Compared with the previous ESPnet-SE work, numerous features have been added, including recent state-of-the-art speech enhancement models with their respective training and evaluation recipes. Importantly, a new interface has been designed to flexibly combine speech enhancement front-ends with other tasks, including automatic speech recognition (ASR), speech translation (ST), and spoken language understanding (SLU). To showcase such integration, we performed experiments on carefully designed synthetic datasets for noisy-reverberant multi-channel ST and SLU tasks, which can be used as benchmark corpora for future research. In addition to these new tasks, we also use CHiME-4 and WSJ0-2Mix to benchmark multi- and single-channel SE approaches. Results show that the integration of SE front-ends with back-end tasks is a promising research direction even for tasks besides ASR, especially in the multi-channel scenario. The code is available online at https://github.com/ESPnet/ESPnet. The multi-channel ST and SLU datasets, which are another contribution of this work, are released on HuggingFace.
Year
DOI
Venue
2022
10.21437/INTERSPEECH.2022-10727
Conference of the International Speech Communication Association (INTERSPEECH)
DocType
Citations 
PageRank 
Conference
0
0.34
References 
Authors
0
13
Name
Order
Citations
PageRank
Yen-Ju Lu100.68
Xuankai Chang201.01
Chenda Li343.83
Wangyou Zhang4125.44
Samuele Cornell500.68
Zhaoheng Ni600.68
Yoshiki Masuyama7115.66
Brian Yan801.01
Robin Scheibler900.34
Zhong-Qiu Wang10689.93
Yu Tsao1165.27
Yanmin Qian1229544.44
Shinji Watanabe131158139.38