Abstract | ||
---|---|---|
The current "state-of-the-art" in phonetic speaker recognition uses relative frequencies of phone n-grams as features for training speaker models and for scoring test-target pairs. Typically, these relative frequencies are computed from a simple 1-best phone decoding of the input speech. In this paper, we present results on the Switchboard-2 corpus, where we compare 1-best phone decodings versus lattice phone decodings for the purposes of performing phonetic speaker recognition. The phone decodings are used to compute relative frequencies of phone bigrams, which are then used as inputs for two standard phonetic speaker recognition systems: a system based on log-likelihood ratios (LLRs) [1, 2], and a system based on support vector machines (SVMs) [3]. In each experiment, the lattice phone decodings achieve relative reductions in equal-error rate (EER) of between 31% and 66% below the EERs of the 1-best phone decodings. Our best phonetic system achieves an EER of 2.0% on 8-conversation training and 1.4% when combined with a GMM-based system. |
Year | DOI | Venue |
---|---|---|
2005 | 10.1109/ICASSP.2005.1415077 | 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING |
Keywords | Field | DocType |
speech,score test,gaussian mixture models,support vector machine,log likelihood ratio,learning artificial intelligence,lattices,sampling methods,support vector machines,decoding,nist,testing,svm,speaker recognition,computer science | Information processing,Pattern recognition,Computer science,Support vector machine,Speech recognition,Speaker recognition,NIST,Phone,Bigram,Artificial intelligence,Decoding methods,Mixture model | Conference |
ISSN | Citations | PageRank |
1520-6149 | 24 | 1.46 |
References | Authors | |
7 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Andrew O. Hatch | 1 | 53 | 3.64 |
Barbara Peskin | 2 | 176 | 18.45 |
Andreas Stolcke | 3 | 6690 | 712.46 |