Title
Rapid speaker adaptation with speaker adaptive training and non-negative matrix factorization
Abstract
In this paper, we describe a novel speaker adaptation algorithm based on Gaussian mixture weight adaptation. A small number of latent speaker vectors are estimated with non-negative matrix factorization (NMF). These base vectors encode the correlations between Gaussian activations as learned from the train data. Expressing the speaker dependent Gaussian mixture weights as a linear combination of a small number of base vectors, reduces the number of parameters that must be estimated from the enrollment data. In order to learn meaningful correlations between Gaussian activations from the train data, the NMF-based weight adaptation was combined with vocal tract length normalization (VTLN) and feature-space maximum likelihood linear regression (fMLLR) based speaker adaptive training based. Evaluation on the 5k closed and 20k open vocabulary Wall Street Journal tasks shows a 4% relative word error rate reduction over the speaker independent recognition system which already incorporates VTLN. The proposed fast adaptation algorithm, using a single enrollment sentence only, results in similar performance as fMLLR adapting on 40 enrollment sentences.
Year
DOI
Venue
2011
10.1109/ICASSP.2011.5947343
ICASSP
Keywords
Field
DocType
speaker adaptation,vocal tract length normalization,regression analysis,maximum likelihood estimation,gaussian activations,fmllr,weight adaptation,speaker dependent gaussian mixture weights,speaker recognition,maximum likelihood linear regression,nonnegative matrix factorization,gaussian processes,vtln,speaker adaptive training,non-negative matrix factorization,feature-space maximum likelihood linear regression,gaussian mixture weight adaptation,data model,non negative matrix factorization,speech recognition,silicon,acoustics,word error rate,data models,hidden markov models,hidden markov model,feature space
Pattern recognition,Computer science,Matrix decomposition,Word error rate,Speech recognition,FMLLR,Gaussian,Speaker recognition,Artificial intelligence,Gaussian process,Non-negative matrix factorization,Hidden Markov model
Conference
ISSN
ISBN
Citations 
1520-6149 E-ISBN : 978-1-4577-0537-3
978-1-4577-0537-3
2
PageRank 
References 
Authors
0.40
5
3
Name
Order
Citations
PageRank
Xueru Zhang1223.61
Kris Demuynck243350.53
Hugo Van hamme356577.43