Title
Speaker extraction network with attention mechanism for speech dialogue system
Abstract
Speech Dialogue System is currently widely used in various fields. Users can interact and communicate with the system through natural language. While in practical situations, there exist third-person background sounds and background noise interference in real dialogue scenes. This issue seriously damages the intelligibility of the speech signal and decreases speech recognition performance. To tackle this, in this paper, we exploit a speech separation method that can help us to separate target speech from complex multi-person speech. We propose a multi-task-attention mechanism, and we select TFCN as our audio feature extraction module. Based on the multi-task method, we use SI-SDR and cross-entropy speaker classification loss function for joint training, and then we use the attention mechanism to further excludes the background vocals in the mixed speech. We not only test our result in Distortion indicators SI-SDR and SDR, but also test with a speech recognition system. To train our model and demonstrate its effectiveness, we build a background vocal removal data set based on a common data set. Experimental results empirically show that our model significantly improves the performance of speech separation model.
Year
DOI
Venue
2022
10.1007/s11761-022-00340-w
Service Oriented Computing and Applications
Keywords
DocType
Volume
Speech dialogue system, Speech separation, Multi-task, Attention
Journal
16
Issue
ISSN
Citations 
2
1863-2386
0
PageRank 
References 
Authors
0.34
14
6
Name
Order
Citations
PageRank
Yun Hao101.35
Jiaju Wu200.34
Xiangkang Huang311.36
Zijia Zhang400.34
fei liu5497.44
Wu Qingyao625933.46