Self-attention based speaker recognition using Cluster-Range Loss. - Citegraph

Paper Info

Title
Self-attention based speaker recognition using Cluster-Range Loss.

Abstract
Speaker recognition with short utterances is a challenging research topic in the natural language processing (NLP) field. Previous convolutional neural network (CNN) based models for speaker recognition usually utilize very deep or wide layers, resulting in many parameters and high computational cost. Besides, great training difficulty and inefficiency exist in the triplet loss, which is widely used in speaker recognition. In this work, we propose to combine the residual network (ResNet) with the self-attention mechanism to achieve better performance in text-independent speaker recognition with fewer parameters and lower computational cost. In addition, the Cluster-Range Loss based on a well-designed online exemplar mining is proposed to directly shrink the intra-class variation and to enlarge the inter-class distance. Experiments on Voxceleb dataset are conducted to verify the effectiveness of the proposed scheme. The proposed approach achieves a Top-1 accuracy of 89.1% for speaker identification by jointly training the network with the Cluster-Range Loss and softmax cross entropy loss. For speaker verification, we achieve a competitive EER of 5.5% without any heavy-tailed backend, compared with the state-of-the-art i-vector system, as well as the x-vector system.

Year	DOI	Venue
2019	10.1016/j.neucom.2019.08.046	Neurocomputing
Keywords	Field	DocType
Self-attention,Speaker recognition,Triplet loss	Speaker verification,Cross entropy,Residual,Speaker identification,Softmax function,Pattern recognition,Convolutional neural network,Inefficiency,Speaker recognition,Artificial intelligence,Mathematics	Journal
Volume	ISSN	Citations
368	0925-2312	0
PageRank	References	Authors
0.34	0	3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Tengyue Bian	1	0	0.34
Fangzhou Chen	2	0	1.01
Xu, L.	3	30	2.71

1