Discriminative In-Set/Out-of-Set Speaker Recognition - Citegraph

Paper Info

Title
Discriminative In-Set/Out-of-Set Speaker Recognition

Abstract
In this paper, the problem of identifying in-set versus out-of-set speakers for limited training/test data durations is addressed. The recognition objective is to form a decision regarding an input speaker as being a legitimate member of a set of enrolled speakers or outside speakers. The general goal is to perform rapid speaker model construction from limited enrollment and test size resources for in-set testing for input audio streams. In-set detection can help ensure security and proper access to private information, as well as detecting and tracking input speakers. Areas of applications of these concepts include rapid speaker tagging and tracking for information retrieval, communication networks, personal device assistants, and location access. We propose an integrated system with emphasis on short-enrollment data (about 5 s of speech for each enrolled speaker) and test data (2-8 s) within a text-independent mode. We present a simple and yet powerful decision rule to accept or reject speakers using a discriminative vector in the decision score space, together with statistical hypothesis testing based on the conventional likelihood ratio test. Discriminative training is introduced to further improve system performance for both decision techniques, by employing minimum classification error and minimum verification error frameworks. Experiments are performed using three separate corpora. Using the YOHO speaker recognition database, the alternative decision rule achieves measurable improvement over the likelihood ratio test, and discriminative training consistently enhances overall system performance with relative improvements ranging from 11.26%-28.68%. A further extended evaluation using the TIMIT (CORPUS1) and actual noisy aircraft communications data (CORPUS2) shows measurable improvement over the traditional MAP based scheme using the likelihood ratio test (MAP-LRT), with average EERs of 9%-23% for TIMIT and 13%-32% for noisy aircraft communications. The result- s confirm that an effective in-set/out-of-set speaker recognition system can be formulated using discriminative training for rapid tagging of input speakers from limited training and test data sizes

Year	DOI	Venue
2007	10.1109/TASL.2006.881689	IEEE Transactions on Audio, Speech & Language Processing
Keywords	Field	DocType
out-of-set speaker recognition,discriminative in-set,input speaker,test size resource,test data,test data size,likelihood ratio test,measurable improvement,test data duration,discriminative training,conventional likelihood ratio test,limited training,localization,discriminant analysis,speech processing,statistical hypothesis testing,hypothesis test,statistical test,database,decision rule,statistical testing,integrable system,speaker recognition,communication networks,system performance,information retrieval,private information	TIMIT,Likelihood-ratio test,Computer science,Speaker recognition,Artificial intelligence,Speaker diarisation,Discriminative model,Statistical hypothesis testing,Decision rule,Pattern recognition,Speech recognition,Test data,Machine learning	Journal
Volume	Issue	ISSN
15	2	1558-7916
Citations	PageRank	References
18	0.88	26
Authors
2

Authors (2 rows)

Cited by (18 rows)

References (26 rows)

Name	Order	Citations	PageRank
Pongtep Angkititrakul	1	179	15.47
John H. L. Hansen	2	3215	365.75

1