Title
Dynamic-attention based Encoder-decoder model for Speaker Extraction with Anchor speech
Abstract
Speech plays an important role in human-computer interaction. For many real applications, an annoying problem is that speech is often degraded by interfering noise. Extracting target speech from background interference is a meaningful and challenging task, especially when interference is also human voice. This work addresses the problem of extracting target speaker from interfering speaker with a short piece of anchor speech which is used to obtain the target speaker identify. We propose a encoder-decoder neural network architecture. Specifically, the encoder transforms the anchor speech to a embedding which is used to represent the identity of target speaker. The decoder utilizes the speaker identity to extract the target speech from mixture. To make a acoustic-related speaker identity, The dynamic-attention mechanism is utilized to build a time-varying embedding for each frame of the mixture. Systematic evaluation indicates that our approach improves the quality of speaker extraction.
Year
DOI
Venue
2019
10.1109/APSIPAASC47483.2019.9023204
2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
Keywords
DocType
ISSN
speaker extraction,dynamic-attention,Encoder-decoder
Conference
2640-009X
ISBN
Citations 
PageRank 
978-1-7281-3249-5
0
0.34
References 
Authors
7
3
Name
Order
Citations
PageRank
Hao Li126185.92
Xueliang Zhang28019.41
Guanglai Gao37824.57