Title | ||
---|---|---|
Use of VTL-wise models in feature-mapping framework to achieve performance of multiple-background models in speaker verification |
Abstract | ||
---|---|---|
Recently, Multiple Background Models (M-BMs) [1, 2] have been shown to be useful in speaker verification, where the M-BMs are formed based on different Vocal Tract Lengths (VTLs) among the population. The speaker models are adapted from the particular Background Model (BM) corresponding to their VTL. During test, log likelihood ratio of the test utterance is calculated between claimant model and the corresponding BM. In this paper, instead of using different BM for different speaker, we propose the use of single gender, channel and VTL independent UBM (root-UBM) using the concept of VTL dependent mapping function. The pro posed concept is inspired by Feature Mapping (FM) technique used in speaker verification to overcome channel variability. In our pro posed method, VTL specific gender independent Gaussian Mixture models (GMMs) are derived from the root-UBM using Maximum a posteriori (MAP) adaptation. The mapping relation is then learned between the root-UBM and the VTL-specific GMM. During training and testing phase, feature vectors are mapped into root-UBM using the best VTL specific model. Then speaker models are adapted from the root-UBM using mapped features. During test, the log likelihood ratio is calculated between target model and root-UBM. Therefore, unlike M-BM system, there is no need to switch to different BMs depending on the claimant. Another advantage of the proposed method is that other additional normalization/compensation techniques can be easily applied since it is in a single UBM frame-work. The experiments are performed on NIST 2004 SRE core condition, and we show that the performance of the proposed method is close to the M-BM system with and without score normalization. |
Year | DOI | Venue |
---|---|---|
2011 | 10.1109/ICASSP.2011.5947367 | Acoustics, Speech and Signal Processing |
Keywords | Field | DocType |
Gaussian processes,speaker recognition,FM technique,GMM,Gaussian mixture model,M-BM,MAP,UBM,VTL,VTL-wise model,feature-mapping framework,log likelihood ratio,maximum a posteriori,multiple-background model,speaker verification,vocal tract length,FM,GMM-UBM,Multiple BM,Speaker Verification,VTL-BM | Population,Feature vector,Normalization (statistics),Pattern recognition,Likelihood-ratio test,Computer science,Speech recognition,Speaker recognition,Gaussian process,Artificial intelligence,Maximum a posteriori estimation,Mixture model | Conference |
ISSN | ISBN | Citations |
1520-6149 E-ISBN : 978-1-4577-0537-3 | 978-1-4577-0537-3 | 1 |
PageRank | References | Authors |
0.35 | 4 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Achintya Kumar Sarkar | 1 | 23 | 7.81 |
Srinivasan Umesh | 2 | 93 | 16.31 |