Dynamic-attention based Encoder-decoder model for Speaker Extraction with Anchor speech - Citegraph

Paper Info

Title
Dynamic-attention based Encoder-decoder model for Speaker Extraction with Anchor speech

Abstract
Speech plays an important role in human-computer interaction. For many real applications, an annoying problem is that speech is often degraded by interfering noise. Extracting target speech from background interference is a meaningful and challenging task, especially when interference is also human voice. This work addresses the problem of extracting target speaker from interfering speaker with a short piece of anchor speech which is used to obtain the target speaker identify. We propose a encoder-decoder neural network architecture. Specifically, the encoder transforms the anchor speech to a embedding which is used to represent the identity of target speaker. The decoder utilizes the speaker identity to extract the target speech from mixture. To make a acoustic-related speaker identity, The dynamic-attention mechanism is utilized to build a time-varying embedding for each frame of the mixture. Systematic evaluation indicates that our approach improves the quality of speaker extraction.

Year	DOI	Venue
2019	10.1109/APSIPAASC47483.2019.9023204	2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
Keywords	DocType	ISSN
speaker extraction,dynamic-attention,Encoder-decoder	Conference	2640-009X
ISBN	Citations	PageRank
978-1-7281-3249-5	0	0.34
References	Authors
7	3

Authors (3 rows)

Cited by (0 rows)

References (7 rows)

Name	Order	Citations	PageRank
Hao Li	1	261	85.92
Xueliang Zhang	2	80	19.41
Guanglai Gao	3	78	24.57

1