textTOvec: Deep Contextualized Neural Autoregressive Models of Language with Distributed Compositional Prior. - Citegraph

Paper Info

Title
textTOvec: Deep Contextualized Neural Autoregressive Models of Language with Distributed Compositional Prior.

Abstract
address two challenges of probabilistic topic modelling in order to better estimate the probability of a word in a given context, i.e., P(word\|context): (1) No Language Structure in Context: Probabilistic topic models ignore word order by summarizing a given context as a bag-of-word and consequently the semantics of words in the context is lost. The LSTM-LM learns a vector-space representation of each word by accounting for word order in local collocation patterns and models complex characteristics of language (e.g., syntax and semantics), while the TM simultaneously learns a latent representation from the entire document and discovers the underlying thematic structure. unite two complementary paradigms of learning the meaning of word occurrences by combining a TM (e.g., DocNADE) and a LM in a unified probabilistic framework, named as ctx-DocNADE. (2) Limited Context and/or Smaller training corpus of documents: In settings with a small number of word occurrences (i.e., lack of context) in short text or data sparsity in a corpus of few documents, the application of TMs is challenging. address this challenge by incorporating external knowledge into neural autoregressive topic models via a language modelling approach: we use word embeddings as input of a LSTM-LM with the aim to improve the word-topic mapping on a smaller and/or short-text corpus. The proposed DocNADE extension is named as ctx-DocNADEe. We present novel neural autoregressive topic model variants coupled with neural LMs and embeddings priors that consistently outperform state-of-the-art generative TMs in terms of generalization (perplexity), interpretability (topic coherence) and applicability (retrieval and classification) over 6 long-text and 8 short-text datasets from diverse domains.

Year	Venue	Field
2018	international conference on learning representations	Interpretability,Perplexity,Word order,Computer science,Natural language processing,Artificial intelligence,Topic model,Probabilistic logic,Syntax,Semantics,Machine learning,Collocation
DocType	Volume	Citations
Journal	abs/1810.03947	1
PageRank	References	Authors
0.35	18	4

Authors (4 rows)

Cited by (1 rows)

References (18 rows)

Name	Order	Citations	PageRank
Pankaj Gupta	1	1479	133.85
Yatin Chaudhary	2	2	1.72
Florian Buettner	3	1	2.72
Hinrich Schütze	4	2113	362.21

1