Abstract | ||
---|---|---|
State-of-the-art probabilistic models of text such as n-grams require an exponential number of examples as the size of the context grows, a problem that is due to the discrete word representation. We propose to solve this problem by learning a continuous-valued and low-dimensional mapping of words, and base our predictions for the probabilities of the target word on non-linear dynamics of the latent space representation of the words in context window. We build on neural networks-based language models; by expressing them as energy-based models, we can further enrich the models with additional inputs such as part-of-speech tags, topic information and graphs of word similarity. We demonstrate a significantly lower perplexity on different text corpora, as well as improved word accuracy rate on speech recognition tasks, as compared to Kneser-Ney back-off n-gram-based language models. |
Year | DOI | Venue |
---|---|---|
2010 | 10.1109/SLT.2010.5700858 | Spoken Language Technology Workshop |
Keywords | Field | DocType |
natural language processing,neural nets,probability,speech recognition,discrete word representation,energy based models,feature rich continuous language models,neural networks,speech recognition,state-of-the-art probabilistic models,topic information,Speech recognition,natural language,neural networks,probability | Perplexity,Cache language model,Computer science,Natural language processing,Artificial intelligence,Artificial neural network,Language model,Pattern recognition,Text corpus,Speech recognition,Natural language,Hidden Markov model,Vocabulary | Conference |
ISBN | Citations | PageRank |
978-1-4244-7902-3 | 2 | 0.37 |
References | Authors | |
15 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Piotr W. Mirowski | 1 | 178 | 13.09 |
Sumit Chopra | 2 | 2835 | 181.37 |
Suhrid Balakrishnan | 3 | 238 | 14.60 |
Srinivas Bangalore | 4 | 1319 | 157.37 |