Speaker extraction network with attention mechanism for speech dialogue system - Citegraph

Paper Info

Title
Speaker extraction network with attention mechanism for speech dialogue system

Abstract
Speech Dialogue System is currently widely used in various fields. Users can interact and communicate with the system through natural language. While in practical situations, there exist third-person background sounds and background noise interference in real dialogue scenes. This issue seriously damages the intelligibility of the speech signal and decreases speech recognition performance. To tackle this, in this paper, we exploit a speech separation method that can help us to separate target speech from complex multi-person speech. We propose a multi-task-attention mechanism, and we select TFCN as our audio feature extraction module. Based on the multi-task method, we use SI-SDR and cross-entropy speaker classification loss function for joint training, and then we use the attention mechanism to further excludes the background vocals in the mixed speech. We not only test our result in Distortion indicators SI-SDR and SDR, but also test with a speech recognition system. To train our model and demonstrate its effectiveness, we build a background vocal removal data set based on a common data set. Experimental results empirically show that our model significantly improves the performance of speech separation model.

Year	DOI	Venue
2022	10.1007/s11761-022-00340-w	Service Oriented Computing and Applications
Keywords	DocType	Volume
Speech dialogue system, Speech separation, Multi-task, Attention	Journal	16
Issue	ISSN	Citations
2	1863-2386	0
PageRank	References	Authors
0.34	14	6

Authors (6 rows)

Cited by (0 rows)

References (14 rows)

Name	Order	Citations	PageRank
Yun Hao	1	0	1.35
Jiaju Wu	2	0	0.34
Xiangkang Huang	3	1	1.36
Zijia Zhang	4	0	0.34
fei liu	5	49	7.44
Wu Qingyao	6	259	33.46

1