Abstract | ||
---|---|---|
Continuous speech separation (CSS) is an arising task in speech separation aiming at separating overlap-free targets from a long, partially-overlapped recording. A straightforward extension of previously proposed sentence-level separation models to this task is to segment the long recording into fixed-length blocks and perform separation on them independently. However, such simple extension does not fully address the cross-block dependencies and the separation performance may not be satisfactory. In this paper, we focus on how the block-level separation performance can be improved by exploring methods to utilize the cross-block information. Based on the recently proposed dual-path RNN (DPRNN) architecture, we investigate how DPRNN can help the block-level separation by the interleaved intra- and inter-block modules. Experiment results show that DPRNN is able to significantly outperform the baseline block-level model in both offline and block-online configurations under certain settings. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1109/SLT48900.2021.9383514 | 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT) |
Keywords | DocType | ISSN |
Continuous speech separation, long recording speech separation, dual-path RNN | Conference | 2639-5479 |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
12 |
Name | Order | Citations | PageRank |
---|---|---|---|
Chenda Li | 1 | 4 | 3.83 |
Yi Luo | 2 | 120 | 13.05 |
Cong Han | 3 | 7 | 4.56 |
Jinyu Li | 4 | 0 | 0.34 |
Takuya Yoshioka | 5 | 585 | 49.20 |
Tianyan Zhou | 6 | 12 | 4.79 |
Marc Delcroix | 7 | 699 | 62.07 |
Keisuke Kinoshita | 8 | 494 | 54.81 |
Boeddeker Christoph | 9 | 3 | 3.84 |
Yanmin Qian | 10 | 295 | 44.44 |
Shinji Watanabe | 11 | 1158 | 139.38 |
Zhuo Chen | 12 | 153 | 24.33 |