Abstract | ||
---|---|---|
In this paper, we present a Sequence-to-Sequence Attentional Siamese Neural Network ( Seq2Seq-ASNN) that leverages temporal alignment information for end-to-end speaker verification. In prior works of speaker discriminative neural networks, utterance-level evaluation/enrollment speaker representations are usually calculated. Our proposed model, utilizing a sequence-to-sequence ( Seq2Seq) attention mechanism, maps the frame-level evaluation representation into enrollment feature domain and further generates an utterance-level evaluation-enrollment joint vector for final similarity measure. Feature learning, attention mechanism, and metric learning are jointly optimized using an end-to-end loss function. Experimental results show that our proposed model outperforms various baseline methods, including the traditional i-Vector/PLDA method, multi-enrollment end-to-end speaker verification models, d-vector approaches, and a self attention model, for text-dependent speaker verification on a Tencent internal voice wake-up dataset. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1109/icassp.2019.8682676 | 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) |
Keywords | Field | DocType |
End-to-end speaker verification, text-dependent, Siamese neural networks, Seq2Seq attention | Speaker verification,Similarity measure,Pattern recognition,Computer science,Attention model,Feature extraction,Artificial intelligence,Artificial neural network,Discriminative model,Feature learning | Conference |
ISSN | Citations | PageRank |
1520-6149 | 2 | 0.35 |
References | Authors | |
0 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Yichi Zhang | 1 | 2 | 0.35 |
Meng Yu | 2 | 524 | 66.52 |
Na Li | 3 | 37 | 23.63 |
Chengzhu Yu | 4 | 16 | 3.77 |
Jia Cui | 5 | 6 | 2.80 |
Dong Yu | 6 | 6264 | 475.73 |