Title | ||
---|---|---|
Performance of a SCFG-based language model with training data sets of increasing size |
Abstract | ||
---|---|---|
In this paper, a hybrid language model which combines a word-based n-gram and a category-based Stochastic Context-Free Grammar (SCFG) is evaluated for training data sets of increasing size. Different estimation algorithms for learning SCFGs in General Format and in Chomsky Normal Form are considered. Experiments on the UPenn Treebank corpus are reported. These experiments have been carried out in terms of the test set perplexity and the word error rate in a speech recognition experiment. |
Year | DOI | Venue |
---|---|---|
2005 | 10.1007/11492542_72 | IbPRIA (2) |
Keywords | Field | DocType |
different estimation algorithm,upenn treebank corpus,chomsky normal form,test set perplexity,category-based stochastic context-free grammar,word error rate,training data set,speech recognition experiment,hybrid language model,scfg-based language model,general format,speech recognition,normal form,stochastic context free grammar,language model | Perplexity,Terminal and nonterminal symbols,Context-free grammar,Computer science,Word error rate,Speech recognition,Natural language processing,Treebank,Artificial intelligence,Chomsky normal form,Language model,Test set | Conference |
Volume | ISSN | ISBN |
3523 | 0302-9743 | 3-540-26154-0 |
Citations | PageRank | References |
1 | 0.35 | 10 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Joan Andreu Sánchez | 1 | 55 | 4.78 |
José Miguel Benedí | 2 | 7 | 1.31 |
Diego Linares | 3 | 24 | 4.54 |