End-to-End Speaker Verification via Curriculum Bipartite Ranking Weighted Binary Cross-Entropy - Citegraph

Paper Info

Title
End-to-End Speaker Verification via Curriculum Bipartite Ranking Weighted Binary Cross-Entropy

Abstract
End-to-end speaker verification achieves the verification through estimating directly the similarity score between a pair of utterances, which is formulated as a binary (i.e., target versus non-target) classification problem. Unlike the stage-wise method, an end-to-end verification approach optimizes the evaluation metrics directly and its output layer is parameter-free, which can save great computing and memory resources. However, it faces two important difficulties. The first one is how to deal with severely imbalanced trials, i.e., the number of target trials is much smaller than that of nontarget trials, and the other is about how to handle easy trials that do not help improve the model in training. To circumvent these two issues, we propose in this paper a binary cross-entropy (BCE) type of loss function and present a method to train the deep neural network (DNN) models based on the proposed loss function for end-to-end speaker verification. The training process employs a bipartite ranking method to deal with the trial imbalance problem and a curriculum learning method to help improve both the training stability and performance of the model by selecting non-target trials from easy to hard ones gradually along the convergence process. Since the training process employs bipartite ranking and curriculum learning and the loss function is of the generalized BCE form, we name the new approach curriculum bipartite ranking weighted binary cross-entropy (CBRW-BCE). Experimental results show that the model trained with CBRW-BCE not only achieves the state-of-the-art performance but is also well calibrated.

Year	DOI	Venue
2022	10.1109/TASLP.2022.3161155	IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING
Keywords	DocType	Volume
Training, Measurement, Feature extraction, Calibration, Training data, Speech processing, Roads, End-to-end, metric learning, bipartite ranking, curriculum learning, calibration	Journal	30
Issue	ISSN	Citations
1	2329-9290	0
PageRank	References	Authors
0.34	0	4

Authors (4 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Zhongxin Bai	1	2	1.87
Jianyu Wang	2	0	0.34
Xiao-Lei Zhang	3	89	7.01
Jingdong Chen	4	1460	128.79

1