Title
Discriminative In-Set/Out-of-Set Speaker Recognition
Abstract
In this paper, the problem of identifying in-set versus out-of-set speakers for limited training/test data durations is addressed. The recognition objective is to form a decision regarding an input speaker as being a legitimate member of a set of enrolled speakers or outside speakers. The general goal is to perform rapid speaker model construction from limited enrollment and test size resources for in-set testing for input audio streams. In-set detection can help ensure security and proper access to private information, as well as detecting and tracking input speakers. Areas of applications of these concepts include rapid speaker tagging and tracking for information retrieval, communication networks, personal device assistants, and location access. We propose an integrated system with emphasis on short-enrollment data (about 5 s of speech for each enrolled speaker) and test data (2-8 s) within a text-independent mode. We present a simple and yet powerful decision rule to accept or reject speakers using a discriminative vector in the decision score space, together with statistical hypothesis testing based on the conventional likelihood ratio test. Discriminative training is introduced to further improve system performance for both decision techniques, by employing minimum classification error and minimum verification error frameworks. Experiments are performed using three separate corpora. Using the YOHO speaker recognition database, the alternative decision rule achieves measurable improvement over the likelihood ratio test, and discriminative training consistently enhances overall system performance with relative improvements ranging from 11.26%-28.68%. A further extended evaluation using the TIMIT (CORPUS1) and actual noisy aircraft communications data (CORPUS2) shows measurable improvement over the traditional MAP based scheme using the likelihood ratio test (MAP-LRT), with average EERs of 9%-23% for TIMIT and 13%-32% for noisy aircraft communications. The result- s confirm that an effective in-set/out-of-set speaker recognition system can be formulated using discriminative training for rapid tagging of input speakers from limited training and test data sizes
Year
DOI
Venue
2007
10.1109/TASL.2006.881689
IEEE Transactions on Audio, Speech & Language Processing
Keywords
Field
DocType
out-of-set speaker recognition,discriminative in-set,input speaker,test size resource,test data,test data size,likelihood ratio test,measurable improvement,test data duration,discriminative training,conventional likelihood ratio test,limited training,localization,discriminant analysis,speech processing,statistical hypothesis testing,hypothesis test,statistical test,database,decision rule,statistical testing,integrable system,speaker recognition,communication networks,system performance,information retrieval,private information
TIMIT,Likelihood-ratio test,Computer science,Speaker recognition,Artificial intelligence,Speaker diarisation,Discriminative model,Statistical hypothesis testing,Decision rule,Pattern recognition,Speech recognition,Test data,Machine learning
Journal
Volume
Issue
ISSN
15
2
1558-7916
Citations 
PageRank 
References 
18
0.88
26
Authors
2
Name
Order
Citations
PageRank
Pongtep Angkititrakul117915.47
John H. L. Hansen23215365.75