On-Line Language Model Biasing For Multi-Pass Automatic Speech Recognition - Citegraph

Paper Info

Title
On-Line Language Model Biasing For Multi-Pass Automatic Speech Recognition

Abstract
The language model (LM) is a critical component in statistical automatic speech recognition (ASR) systems, serving to establish a probability distribution over the hypothesis space. In typical use, the LM is trained off-line and remains static at run-time. While cache LMs, dialogue/style adaptation, and information retrieval-based biasing provide some ability for modifying the LM at run-time, they are limited in scope, susceptible to recognition error, place restrictions on the training data and/or test sets, or cannot be implemented for on-line, interactive systems. In this paper, we describe a novel LM biasing method suitable for multi-pass ASR systems. We use k-best lists from the initial recognition pass to obtain a confidence-weiglited biasing of the LM training corpus. The latter is used to train a LM biased to the test input. The biased LM is used in the second pass to obtain refined hypotheses either by re-decoding or by re-ranking the k-best list. We sketch an on-line implementation of this scheme that lends itself to integration within low-latency systems. The proposed method is robust to recognition error, and operates on individual utterances without the need for dialogue context. The biased LMs provide significant reduction in perplexity and consistent improvement in word error rate (WER) over unbiased, state-of-the-art, large-vocabulary baseline ASR systems. On the Farsi and English test sets, we obtained relative reductions in perplexity of 24.5% and 31.6%, respectively. Additionally, relative reductions of 1.6% and 1.8% in WER were obtained for large-vocabulary Farsi and English ASR, respectively.

Year	Venue	Keywords
2011	12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5	speech recognition, language model biasing, multipass ASR, k-best list rescoring
Field	DocType	Citations
Computer science,Speech recognition,Natural language processing,Artificial intelligence,Language model,Acoustic model,Biasing	Conference	1
PageRank	References	Authors
0.35	1	4

Authors (4 rows)

Cited by (1 rows)

References (1 rows)

Name	Order	Citations	PageRank
Sankaranarayanan Ananthakrishnan	1	134	13.29
Stavros Tsakalidis	2	213	13.83
Rohit Prasad	3	465	39.06
Premkumar Natarajan	4	874	79.46

1