Abstract | ||
---|---|---|
The continuous speech separation (CSS) is a task to separate the speech sources from a long, partially overlapped recording, which involves a varying number of speakers. A straightforward extension of conventional utterance-level speech separation to the CSS task is to segment the long recording with a size-fixed window and process each window separately. Though effective, this extension fails to model the long dependency in speech and thus leads to sub-optimum performance. The recent proposed dual-path modeling could be a remedy to this problem, thanks to its capability in jointly modeling the cross-window dependency and the local-window processing. In this work, we further extend the dual-path modeling framework for CSS task. A transformer-based dual-path system is proposed, which integrates transform layers for global modeling. The proposed models are applied to LibriCSS, a real recorded multi-talk dataset, and consistent WER reduction can be observed in the ASR evaluation for separated speech. Also, a dual-path transformer equipped with convolutional layers is proposed. It significantly reduces the computation amount by 30% with better WER evaluation. Furthermore, the online processing dual-path models are investigated, which shows 10% relative WER reduction compared to the baseline. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1109/ICASSP39728.2021.9414127 | 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) |
Keywords | DocType | Citations |
continuous speech separation, long recording speech separation, online processing, dual-path modeling | Conference | 0 |
PageRank | References | Authors |
0.34 | 0 | 9 |
Name | Order | Citations | PageRank |
---|---|---|---|
Chenda Li | 1 | 4 | 3.83 |
Zhuo Chen | 2 | 153 | 24.33 |
Yi Luo | 3 | 120 | 13.05 |
Cong Han | 4 | 7 | 4.56 |
Tianyan Zhou | 5 | 12 | 4.79 |
Keisuke Kinoshita | 6 | 494 | 54.81 |
Marc Delcroix | 7 | 699 | 62.07 |
Shinji Watanabe | 8 | 1158 | 139.38 |
Yanmin Qian | 9 | 7 | 4.16 |