Abstract | ||
---|---|---|
A language model (LM) is an important part of a speech recognition system. Language model adaptation techniques use a large amount of source domain data and limited target domain data to improve the performance of language models in target domain. Even though text datasets are easy to obtain, there is no public Chinese text dataset for language model adaptation tasks. This paper presents a language model adaptation dataset which consists of four different domains of news data, i.e., sport, stock, fashion, finance. The discrepancy between the domains of data is evaluated. Model combination based adaptation of n-gram is evaluated on the dataset. Three different fine-tuning adaptation methods of recurrent neural network language models (RNNLMs) are evaluated. WER results on AIShell speech data with the language models trained on this dataset are also provided. The absolute WER reduction of lattice rescoring with adapted RNNLM is 4.74%. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1109/ISCSLP.2018.8706600 | 2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP) |
Keywords | Field | DocType |
Adaptation models,Sports,Data models,Training,Testing,Vocabulary,Artificial neural networks | Data modeling,Recurrent neural network language models,Computer science,Speech recognition,Artificial neural network,Vocabulary,Language model | Conference |
ISBN | Citations | PageRank |
978-1-5386-5627-3 | 0 | 0.34 |
References | Authors | |
0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ye Bai | 1 | 7 | 5.52 |
Jianhua Tao | 2 | 848 | 138.00 |
Jiangyan Yi | 3 | 19 | 17.99 |
Zhengqi Wen | 4 | 86 | 24.41 |
Cunhang Fan | 5 | 0 | 1.35 |