Title | ||
---|---|---|
TIME-DOMAIN SPEECH EXTRACTION WITH SPATIAL INFORMATION AND MULTI SPEAKER CONDITIONING MECHANISM |
Abstract | ||
---|---|---|
In this paper, we present a novel multi-channel speech extraction system to simultaneously extract multiple clean individual sources from a mixture in noisy and reverberant environments. The proposed method is built on an improved multi-channel time-domain speech separation network which employs speaker embeddings to identify and extract multiple targets without label permutation ambiguity. To efficiently inform the speaker information to the extraction model, we propose a new speaker conditioning mechanism by designing an additional speaker branch for receiving external speaker embeddings. Experiments on 2-channel WHAMR! data show that the proposed system improves by 9% relative the source separation performance over a strong multi-channel baseline, and it increases the speech recognition accuracy by more than 16% relative over the same baseline. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1109/ICASSP39728.2021.9414092 | 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) |
Keywords | DocType | Citations |
multi-channel source separation, multi-speaker extraction, noise, reverberation | Conference | 1 |
PageRank | References | Authors |
0.37 | 0 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Jisi Zhang | 1 | 2 | 0.71 |
Catalin Zorila | 2 | 2 | 2.74 |
Rama Doddipatla | 3 | 2 | 4.09 |
Jon Barker | 4 | 676 | 64.08 |