Title
Large vocabulary conversational speech recognition with a subspace constraint on inverse covariance matrices
Abstract
This paper applies the recently proposed SPAM models for acoustic modeling in a Speaker Adaptive Training (SAT) con- text on large vocabulary conversational speech databases, in- cluding the Switchboard database. SPAM models are Gaus- sian mixture models in which a subspace constraint is placed on the precision and mean matrices (although this paper fo- cuses on the case of unconstrained means). They include di- agonal covariance, full covariance, MLLT, and EMLLT mod- els as special cases. Adaptation is carried out with maximum likelihood estimation of the means and feature-space under the SPAM model. This paper shows the first experimental evidence that the SPAM models can achieve significant word-error-rate improvements over state-of-the-art diagonal covariance mod- els, even when those diagonal models are given the benefit of choosing the optimal number of Gaussians (according to the Bayesian Information Criterion). This paper also is the first to apply SPAM models in a SAT context. All experiments are per- formed on the IBM "Superhuman" speech corpus which is a challenging and diverse conversational speech test set that in- cludes the Switchboard portion of the 1998 Hub5e evaluation data set.
Year
Venue
Keywords
2003
INTERSPEECH
bayesian information criterion,maximum likelihood estimate,word error rate,feature space
Field
DocType
Citations 
Diagonal,Speech corpus,Bayesian information criterion,Computer science,Artificial intelligence,Covariance,Subspace topology,Pattern recognition,Speech recognition,Vocabulary,Machine learning,Mixture model,Test set
Conference
6
PageRank 
References 
Authors
0.85
15
5
Name
Order
Citations
PageRank
Scott Axelrod111310.14
Vaibhava Goel237641.25
B. Kingsbury34175335.43
Karthik Visweswariah440038.22
Ramesh A. Gopinath532342.58