Short text topic modeling by exploring original documents. - Citegraph

Paper Info

Title
Short text topic modeling by exploring original documents.

Abstract
Topic modeling for short texts faces a tough challenge, owing to the sparsity problem. An effective solution is to aggregate short texts into long pseudo-documents before training a standard topic model. The main concern of this solution is the way of aggregating short texts. A recent developed self-aggregation-based topic model (SATM) can adaptively aggregate short texts without using heuristic information. However, the model definition of SATM is a bit rigid, and more importantly, it tends to overfitting and time-consuming for large-scale corpora. To improve SATM, we propose a generalized topic model for short texts, namely latent topic model (LTM). In LTM, we assume that the observable short texts are snippets of normal long texts (namely original documents) generated by a given standard topic model, but their original document memberships are unknown. With Gibbs sampling, LTM drives an adaptive aggregation process of short texts, and simultaneously estimates other latent variables of interest. Additionally, we propose a mini-batch scheme for fast inference. Experimental results indicate that LTM is competitive with the state-of-the-art baseline models on short text topic modeling.

Year	DOI	Venue
2018	10.1007/s10115-017-1099-0	Knowl. Inf. Syst.
Keywords	Field	DocType
Short text,Topic modeling,Original document,Fast inference	Data mining,Heuristic,Computer science,Inference,Latent variable,Artificial intelligence,Natural language processing,Overfitting,Topic model,Gibbs sampling,Machine learning	Journal
Volume	Issue	ISSN
56	2	0219-1377
Citations	PageRank	References
6	0.48	20
Authors
4

Authors (4 rows)

Cited by (6 rows)

References (20 rows)

Name	Order	Citations	PageRank
Ximing Li	1	44	13.97
Changchun Li	2	11	1.89
Jinjin Chi	3	15	3.41
Jihong OuYang	4	94	15.66

1