Title
textTOvec: Deep Contextualized Neural Autoregressive Models of Language with Distributed Compositional Prior.
Abstract
address two challenges of probabilistic topic modelling in order to better estimate the probability of a word in a given context, i.e., P(word|context): (1) No Language Structure in Context: Probabilistic topic models ignore word order by summarizing a given context as a bag-of-word and consequently the semantics of words in the context is lost. The LSTM-LM learns a vector-space representation of each word by accounting for word order in local collocation patterns and models complex characteristics of language (e.g., syntax and semantics), while the TM simultaneously learns a latent representation from the entire document and discovers the underlying thematic structure. unite two complementary paradigms of learning the meaning of word occurrences by combining a TM (e.g., DocNADE) and a LM in a unified probabilistic framework, named as ctx-DocNADE. (2) Limited Context and/or Smaller training corpus of documents: In settings with a small number of word occurrences (i.e., lack of context) in short text or data sparsity in a corpus of few documents, the application of TMs is challenging. address this challenge by incorporating external knowledge into neural autoregressive topic models via a language modelling approach: we use word embeddings as input of a LSTM-LM with the aim to improve the word-topic mapping on a smaller and/or short-text corpus. The proposed DocNADE extension is named as ctx-DocNADEe. We present novel neural autoregressive topic model variants coupled with neural LMs and embeddings priors that consistently outperform state-of-the-art generative TMs in terms of generalization (perplexity), interpretability (topic coherence) and applicability (retrieval and classification) over 6 long-text and 8 short-text datasets from diverse domains.
Year
Venue
Field
2018
international conference on learning representations
Interpretability,Perplexity,Word order,Computer science,Natural language processing,Artificial intelligence,Topic model,Probabilistic logic,Syntax,Semantics,Machine learning,Collocation
DocType
Volume
Citations 
Journal
abs/1810.03947
1
PageRank 
References 
Authors
0.35
18
4
Name
Order
Citations
PageRank
Pankaj Gupta11479133.85
Yatin Chaudhary221.72
Florian Buettner312.72
Hinrich Schütze42113362.21