Title
Self-attention based speaker recognition using Cluster-Range Loss.
Abstract
Speaker recognition with short utterances is a challenging research topic in the natural language processing (NLP) field. Previous convolutional neural network (CNN) based models for speaker recognition usually utilize very deep or wide layers, resulting in many parameters and high computational cost. Besides, great training difficulty and inefficiency exist in the triplet loss, which is widely used in speaker recognition. In this work, we propose to combine the residual network (ResNet) with the self-attention mechanism to achieve better performance in text-independent speaker recognition with fewer parameters and lower computational cost. In addition, the Cluster-Range Loss based on a well-designed online exemplar mining is proposed to directly shrink the intra-class variation and to enlarge the inter-class distance. Experiments on Voxceleb dataset are conducted to verify the effectiveness of the proposed scheme. The proposed approach achieves a Top-1 accuracy of 89.1% for speaker identification by jointly training the network with the Cluster-Range Loss and softmax cross entropy loss. For speaker verification, we achieve a competitive EER of 5.5% without any heavy-tailed backend, compared with the state-of-the-art i-vector system, as well as the x-vector system.
Year
DOI
Venue
2019
10.1016/j.neucom.2019.08.046
Neurocomputing
Keywords
Field
DocType
Self-attention,Speaker recognition,Triplet loss
Speaker verification,Cross entropy,Residual,Speaker identification,Softmax function,Pattern recognition,Convolutional neural network,Inefficiency,Speaker recognition,Artificial intelligence,Mathematics
Journal
Volume
ISSN
Citations 
368
0925-2312
0
PageRank 
References 
Authors
0.34
0
3
Name
Order
Citations
PageRank
Tengyue Bian100.34
Fangzhou Chen201.01
Xu, L.3302.71