Title | ||
---|---|---|
Espnet-Se: End-To-End Speech Enhancement And Separation Toolkit Designed For Asr Integration |
Abstract | ||
---|---|---|
We present ESPnet-SE, which is designed for the quick development of speech enhancement and speech separation systems in a single framework, along with the optional downstream speech recognition module. ESPnet-SE is a new project which integrates rich automatic speech recognition related models, resources and systems to support and validate the proposed front-end implementation (i.e. speech enhancement and separation). It is capable of processing both single-channel and multi-channel data, with various functionalities including dereverberation, denoising and source separation. We provide all-in-one recipes including data pre-processing, feature extraction, training and evaluation pipelines for a wide range of benchmark datasets. This paper describes the design of the toolkit, several important functionalities, especially the speech recognition integration, which differentiates ESPnet-SE from other open source toolkits, and experimental results with major benchmark datasets. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1109/SLT48900.2021.9383615 | 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT) |
Keywords | DocType | ISSN |
Open-source, end-to-end, speech enhancement, source separation, speech recognition | Conference | 2639-5479 |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
11 |
Name | Order | Citations | PageRank |
---|---|---|---|
Chenda Li | 1 | 4 | 3.83 |
Jing Shi | 2 | 5 | 5.80 |
Wangyou Zhang | 3 | 12 | 5.44 |
S. Aswin Shanmugam | 4 | 7 | 4.21 |
Xuankai Chang | 5 | 0 | 0.68 |
Naoyuki Kamo | 6 | 0 | 0.68 |
Moto Hira | 7 | 0 | 0.34 |
Tomoki Hayashi | 8 | 96 | 18.49 |
Boeddeker Christoph | 9 | 3 | 3.84 |
Zhuo Chen | 10 | 153 | 24.33 |
Shinji Watanabe | 11 | 1158 | 139.38 |