Title
When Speaker Recognition Meets Noisy Labels: Optimizations for Front-Ends and Back-Ends
Abstract
A typical speaker recognition system often involves two modules: a feature extractor front-end and a speaker identification back-end. Despite the superior performance that deep neural networks have achieved for the front-end, their success benefits from the availability of large-scale, correctly labeled datasets. While label noise is unavoidable in speaker recognition datasets, both the front-end and back-end are affected by label noise, which degrades speaker recognition performance. In this paper, we first conduct comprehensive experiments to help improve our understanding of the effects of label noise on both the front-end and back-end. Then, we propose a simple yet effective training paradigm and loss correction method to handle label noise in the front-end. We combine our proposed method with the recently proposed Bayesian estimation of PLDA for noisy labels, and the whole system shows strong robustness to label noise. Furthermore, we show two practical applications of the improved system: one application corrects noisy labels based on an utterance's chunk-level predictions, and the other algorithmically filters out high-confidence noisy samples within a dataset. By applying the second application to the NIST SRE04-10 dataset and verifying filtered utterances by human validation, we identify that approximately 1% of the NIST SRE04-10 dataset is made up of label errors.
Year
DOI
Venue
2022
10.1109/TASLP.2022.3169977
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING
Keywords
DocType
Volume
Speech processing, Licenses, Speaker recognition, noisy labels, x-vector, probabilistic linear discriminant analysis
Journal
30
Issue
ISSN
Citations 
1
2329-9290
0
PageRank 
References 
Authors
0.34
0
3
Name
Order
Citations
PageRank
Lin Li132379.92
Fuchuan Tong201.01
Q. Y. Hong35015.79