Abstract | ||
---|---|---|
We investigate the performance of the Structured Language Model (SLM) in terms of perplexity (PPL) when its components are modeled by connectionist models. The connectionist models use a distributed representation of the items in the history and make much better use of contexts than currently used interpolated or back-off models, not only because of the inherent capability of the connectionist model in fighting the data sparseness problem, but also because of the sublinear growth in the model size when the context length is increased. The connectionist models can be further trained by an EM procedure, similar to the previously used procedure for training the SLM. Our experiments show that the connectionist models can significantly improve the PPL over the interpolated and back-off models on the UPENN Treebank corpora, after interpolating with a baseline trigram language model. The EM training procedure can improve the connectionist models further, by using hidden events obtained by the SLM parser. |
Year | DOI | Venue |
---|---|---|
2003 | 10.3115/1119355.1119376 | EMNLP |
Keywords | Field | DocType |
better use,training connectionist model,em procedure,slm parser,structured language model,em training procedure,back-off model,baseline trigram language model,upenn treebank corpus,model size,connectionist model | Perplexity,Trigram language model,Computer science,Interpolation,Natural language processing,Artificial intelligence,Distributed representation,Language model,Connectionism,Speech recognition,Treebank,Parsing,Machine learning | Conference |
Volume | Citations | PageRank |
W03-10 | 11 | 3.87 |
References | Authors | |
12 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Peng Xu | 1 | 23 | 5.68 |
Ahmad Emami | 2 | 138 | 26.52 |
Frederick Jelinek | 3 | 139 | 23.22 |