A method of multi-models fusion for speaker recognition - Citegraph

Paper Info

Title
A method of multi-models fusion for speaker recognition

Abstract
As a new type of biometrics recognition technology, speaker recognition is gaining more and more attention because of the advantages in remote authentication. In this paper, we construct an end-to-end speaker recognition model named GAPCNN in which a convolutional neural network is used to extract speaker embeddings from spectrogram, and speaker recognition is performed by the cosine similarity of embeddings. In addition, we use global average pooling instead of the traditional temporal average pooling to adapt to different voice lengths. We use the ‘dev’ set of Voxceleb2 for training, then evaluate the model in the test set of Voxceleb1, and obtain an equal error rate (EER) of 4.04%. Furthermore, we fuse our GAPCNN with the x-vector model and the thin-Resnet model with GhostVLAD, and obtain an EER of 3.01% which is better than any of the three. It indicates that GAPCNN is an important complement to the x-vector model and the thin-Resnet model with GhostVLAD.

Year	DOI	Venue
2022	10.1007/s10772-022-09973-w	International Journal of Speech Technology
Keywords	DocType	Volume
Speaker recognition, Speaker verification, Model fusion, CNN	Journal	25
Issue	ISSN	Citations
2	1381-2416	0
PageRank	References	Authors
0.34	2	4

Authors (4 rows)

Cited by (0 rows)

References (2 rows)

Name	Order	Citations	PageRank
Wu Hao	1	0	0.34
Linkai Luo	2	163	14.00
Hong Peng	3	14	10.33
Wen Wei	4	0	0.34

1