Use of VTL-wise models in feature-mapping framework to achieve performance of multiple-background models in speaker verification - Citegraph

Paper Info

Title
Use of VTL-wise models in feature-mapping framework to achieve performance of multiple-background models in speaker verification

Abstract
Recently, Multiple Background Models (M-BMs) [1, 2] have been shown to be useful in speaker verification, where the M-BMs are formed based on different Vocal Tract Lengths (VTLs) among the population. The speaker models are adapted from the particular Background Model (BM) corresponding to their VTL. During test, log likelihood ratio of the test utterance is calculated between claimant model and the corresponding BM. In this paper, instead of using different BM for different speaker, we propose the use of single gender, channel and VTL independent UBM (root-UBM) using the concept of VTL dependent mapping function. The pro posed concept is inspired by Feature Mapping (FM) technique used in speaker verification to overcome channel variability. In our pro posed method, VTL specific gender independent Gaussian Mixture models (GMMs) are derived from the root-UBM using Maximum a posteriori (MAP) adaptation. The mapping relation is then learned between the root-UBM and the VTL-specific GMM. During training and testing phase, feature vectors are mapped into root-UBM using the best VTL specific model. Then speaker models are adapted from the root-UBM using mapped features. During test, the log likelihood ratio is calculated between target model and root-UBM. Therefore, unlike M-BM system, there is no need to switch to different BMs depending on the claimant. Another advantage of the proposed method is that other additional normalization/compensation techniques can be easily applied since it is in a single UBM frame-work. The experiments are performed on NIST 2004 SRE core condition, and we show that the performance of the proposed method is close to the M-BM system with and without score normalization.

Year	DOI	Venue
2011	10.1109/ICASSP.2011.5947367	Acoustics, Speech and Signal Processing
Keywords	Field	DocType
Gaussian processes,speaker recognition,FM technique,GMM,Gaussian mixture model,M-BM,MAP,UBM,VTL,VTL-wise model,feature-mapping framework,log likelihood ratio,maximum a posteriori,multiple-background model,speaker verification,vocal tract length,FM,GMM-UBM,Multiple BM,Speaker Verification,VTL-BM	Population,Feature vector,Normalization (statistics),Pattern recognition,Likelihood-ratio test,Computer science,Speech recognition,Speaker recognition,Gaussian process,Artificial intelligence,Maximum a posteriori estimation,Mixture model	Conference
ISSN	ISBN	Citations
1520-6149 E-ISBN : 978-1-4577-0537-3	978-1-4577-0537-3	1
PageRank	References	Authors
0.35	4	2

Authors (2 rows)

Cited by (1 rows)

References (4 rows)

Name	Order	Citations	PageRank
Achintya Kumar Sarkar	1	23	7.81
Srinivasan Umesh	2	93	16.31

1