Title
Large-scale discriminative language model reranking for voice-search
Abstract
We present a distributed framework for large-scale discriminative language models that can be integrated within a large vocabulary continuous speech recognition (LVCSR) system using lattice rescoring. We intentionally use a weakened acoustic model in a baseline LVCSR system to generate candidate hypotheses for voice-search data; this allows us to utilize large amounts of unsupervised data to train our models. We propose an efficient and scalable MapReduce framework that uses a perceptron-style distributed training strategy to handle these large amounts of data. We report small but significant improvements in recognition accuracies on a standard voice-search data set using our discriminative reranking model. We also provide an analysis of the various parameters of our models including model size, types of features, size of partitions in the MapReduce framework with the help of supporting experiments.
Year
Venue
Keywords
2012
WLM@NAACL-HLT
standard voice-search data,unsupervised data,large vocabulary continuous speech,scalable mapreduce framework,voice-search data,discriminative reranking model,large-scale discriminative language model,mapreduce framework,model size,large amount
Field
DocType
Citations 
Computer science,Speech recognition,Natural language processing,Artificial intelligence,Discriminative model,Vocabulary,Machine learning,Language model,Voice search,Scalability,Acoustic model
Conference
3
PageRank 
References 
Authors
0.47
11
4
Name
Order
Citations
PageRank
Preethi Jyothi1577.85
Leif Johnson2374.34
Ciprian Chelba31055111.19
Brian Strope49510.99