Title
Web-Data Augmented Language Models for Mandarin Conversational Speech Recognition
Abstract
Lack of data is a problem in training language models for conversational speech recognition, particularly for languages other than English. Experiments in English have successfully used web-based text collection targeted for a conversational style to augment small sets of transcribed speech; here we look at extending these techniques to Mandarin. In addition, we investigate different techniques for topic adaptation. Experiments in recognizing Mandarin telephone conversations show that use of filtered web data leads to a 28% reduction in perplexity and 7% reduction in character error rate, with most of the gain due to the general filtered web data.
Year
DOI
Venue
2005
10.1109/ICASSP.2005.1415182
ICASSP '05). IEEE International Conference
Keywords
Field
DocType
Internet,error statistics,learning (artificial intelligence),natural languages,speech recognition,text analysis,Mandarin conversational speech recognition,Mandarin speech recognition,Mandarin telephone conversations,Web-based text collection,Web-data augmented language models,character error rate,language model training,perplexity,topic adaptation
Perplexity,Computer science,Word error rate,Speech recognition,Natural language,Artificial intelligence,Natural language processing,Information engineering,Telephony,Mandarin Chinese,Language model,The Internet
Conference
Volume
ISSN
ISBN
1
1520-6149
0-7803-8874-7
Citations 
PageRank 
References 
25
1.75
7
Authors
6
Name
Order
Citations
PageRank
Tim Ng11229.38
Mari Ostendorf22462348.75
Mei-Yuh Hwang3477124.33
Manhung Siu446461.40
Ivan Bulyko524922.40
Xin Lei621919.36