Title
Espnet-Se: End-To-End Speech Enhancement And Separation Toolkit Designed For Asr Integration
Abstract
We present ESPnet-SE, which is designed for the quick development of speech enhancement and speech separation systems in a single framework, along with the optional downstream speech recognition module. ESPnet-SE is a new project which integrates rich automatic speech recognition related models, resources and systems to support and validate the proposed front-end implementation (i.e. speech enhancement and separation). It is capable of processing both single-channel and multi-channel data, with various functionalities including dereverberation, denoising and source separation. We provide all-in-one recipes including data pre-processing, feature extraction, training and evaluation pipelines for a wide range of benchmark datasets. This paper describes the design of the toolkit, several important functionalities, especially the speech recognition integration, which differentiates ESPnet-SE from other open source toolkits, and experimental results with major benchmark datasets.
Year
DOI
Venue
2021
10.1109/SLT48900.2021.9383615
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT)
Keywords
DocType
ISSN
Open-source, end-to-end, speech enhancement, source separation, speech recognition
Conference
2639-5479
Citations 
PageRank 
References 
0
0.34
0
Authors
11
Name
Order
Citations
PageRank
Chenda Li143.83
Jing Shi255.80
Wangyou Zhang3125.44
S. Aswin Shanmugam474.21
Xuankai Chang500.68
Naoyuki Kamo600.68
Moto Hira700.34
Tomoki Hayashi89618.49
Boeddeker Christoph933.84
Zhuo Chen1015324.33
Shinji Watanabe111158139.38