Title
Seq2seq Attentional Siamese Neural Networks For Text-Dependent Speaker Verification
Abstract
In this paper, we present a Sequence-to-Sequence Attentional Siamese Neural Network ( Seq2Seq-ASNN) that leverages temporal alignment information for end-to-end speaker verification. In prior works of speaker discriminative neural networks, utterance-level evaluation/enrollment speaker representations are usually calculated. Our proposed model, utilizing a sequence-to-sequence ( Seq2Seq) attention mechanism, maps the frame-level evaluation representation into enrollment feature domain and further generates an utterance-level evaluation-enrollment joint vector for final similarity measure. Feature learning, attention mechanism, and metric learning are jointly optimized using an end-to-end loss function. Experimental results show that our proposed model outperforms various baseline methods, including the traditional i-Vector/PLDA method, multi-enrollment end-to-end speaker verification models, d-vector approaches, and a self attention model, for text-dependent speaker verification on a Tencent internal voice wake-up dataset.
Year
DOI
Venue
2019
10.1109/icassp.2019.8682676
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)
Keywords
Field
DocType
End-to-end speaker verification, text-dependent, Siamese neural networks, Seq2Seq attention
Speaker verification,Similarity measure,Pattern recognition,Computer science,Attention model,Feature extraction,Artificial intelligence,Artificial neural network,Discriminative model,Feature learning
Conference
ISSN
Citations 
PageRank 
1520-6149
2
0.35
References 
Authors
0
6
Name
Order
Citations
PageRank
Yichi Zhang120.35
Meng Yu252466.52
Na Li33723.63
Chengzhu Yu4163.77
Jia Cui562.80
Dong Yu66264475.73