Title | ||
---|---|---|
Dynamic-attention based Encoder-decoder model for Speaker Extraction with Anchor speech |
Abstract | ||
---|---|---|
Speech plays an important role in human-computer interaction. For many real applications, an annoying problem is that speech is often degraded by interfering noise. Extracting target speech from background interference is a meaningful and challenging task, especially when interference is also human voice. This work addresses the problem of extracting target speaker from interfering speaker with a short piece of anchor speech which is used to obtain the target speaker identify. We propose a encoder-decoder neural network architecture. Specifically, the encoder transforms the anchor speech to a embedding which is used to represent the identity of target speaker. The decoder utilizes the speaker identity to extract the target speech from mixture. To make a acoustic-related speaker identity, The dynamic-attention mechanism is utilized to build a time-varying embedding for each frame of the mixture. Systematic evaluation indicates that our approach improves the quality of speaker extraction. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1109/APSIPAASC47483.2019.9023204 | 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) |
Keywords | DocType | ISSN |
speaker extraction,dynamic-attention,Encoder-decoder | Conference | 2640-009X |
ISBN | Citations | PageRank |
978-1-7281-3249-5 | 0 | 0.34 |
References | Authors | |
7 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Hao Li | 1 | 261 | 85.92 |
Xueliang Zhang | 2 | 80 | 19.41 |
Guanglai Gao | 3 | 78 | 24.57 |