A Principle Solution for Enroll-Test Mismatch in Speaker Recognition - Citegraph

Paper Info

Title
A Principle Solution for Enroll-Test Mismatch in Speaker Recognition

Abstract
Mismatch between enrollment and test conditions causes serious performance degradation on speaker recognition systems. This paper presents a statistics decomposition (SD) approach to solve this problem. This approach decomposes the PLDA score into three components that corresponding to enrollment, prediction and normalization respectively. Given that correct statistics are used in each component, the resultant score is theoretically optimal. A comprehensive experimental study was conducted on three datasets with different types of mismatch: (1) physical channel mismatch, (2) long-term speaker characteristics mismatch, (3) near-far recording mismatch. The results demonstrated that the proposed SD approach is highly effective, and outperforms the ad-hoc multi-condition training approach that is commonly adopted but not optimal in theory.

Year	DOI	Venue
2022	10.1109/TASLP.2022.3140558	IEEE/ACM Transactions on Audio, Speech, and Language Processing
Keywords	DocType	Volume
Condition mismatch,deep speaker embedding,speaker recognition	Journal	30
Issue	ISSN	Citations
1	2329-9290	0
PageRank	References	Authors
0.34	0	7

Authors (7 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Lantian Li	1	53	13.55
Dong Wang	2	375	39.86
Jiawen Kang	3	0	1.35
Renyu Wang	4	0	1.01
Jing Wu	5	49	16.62
Zhendong Gao	6	0	0.34
Xi Chen	7	333	70.76

1