Title
Short text topic modeling by exploring original documents.
Abstract
Topic modeling for short texts faces a tough challenge, owing to the sparsity problem. An effective solution is to aggregate short texts into long pseudo-documents before training a standard topic model. The main concern of this solution is the way of aggregating short texts. A recent developed self-aggregation-based topic model (SATM) can adaptively aggregate short texts without using heuristic information. However, the model definition of SATM is a bit rigid, and more importantly, it tends to overfitting and time-consuming for large-scale corpora. To improve SATM, we propose a generalized topic model for short texts, namely latent topic model (LTM). In LTM, we assume that the observable short texts are snippets of normal long texts (namely original documents) generated by a given standard topic model, but their original document memberships are unknown. With Gibbs sampling, LTM drives an adaptive aggregation process of short texts, and simultaneously estimates other latent variables of interest. Additionally, we propose a mini-batch scheme for fast inference. Experimental results indicate that LTM is competitive with the state-of-the-art baseline models on short text topic modeling.
Year
DOI
Venue
2018
10.1007/s10115-017-1099-0
Knowl. Inf. Syst.
Keywords
Field
DocType
Short text,Topic modeling,Original document,Fast inference
Data mining,Heuristic,Computer science,Inference,Latent variable,Artificial intelligence,Natural language processing,Overfitting,Topic model,Gibbs sampling,Machine learning
Journal
Volume
Issue
ISSN
56
2
0219-1377
Citations 
PageRank 
References 
6
0.48
20
Authors
4
Name
Order
Citations
PageRank
Ximing Li14413.97
Changchun Li2111.89
Jinjin Chi3153.41
Jihong OuYang49415.66