Title
TSSE-DMM: Topic Modeling for Short Texts Based on Topic Subdivision and Semantic Enhancement
Abstract
Short texts have been prevalent in Web sites and the emerging social media for several years, which makes it a critical task to identify intelligible topics from online data sources. However, the existing topic models over short texts cannot analyze the internal components of the learned topics, which is significant for improving the coherence and interpretability of topics. In this paper, we propose a novel topic model for short texts, named TSSE-DMM, for improving the coherence and interpretability of topics by the topic subdivision and alleviating the problem of text sparsity by the semantic enhancement strategy. Firstly, we subdivide each topic into 4 detailed aspects, namely the location aspect, the people & organization aspect, the core word aspect, and the background word aspect, to obtain the different and interpretable components of topics. Then, we combine the Generalized Polya Urn model and the joint word embedding to solve the problem of data sparsity. The extensive experimental results carried on three real-world text collections in two languages show that our model achieves better topic representations than the baseline methods. Moreover, our method has been adopted by the public service hotline platform of Jiangsu province in China.
Year
DOI
Venue
2021
10.1007/978-3-030-75765-6_51
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2021, PT II
Keywords
DocType
Volume
Topic model, Topic subdivision, Semantic enhancement
Conference
12713
ISSN
Citations 
PageRank 
0302-9743
0
0.34
References 
Authors
0
6
Name
Order
Citations
PageRank
Chengcheng Mai100.34
Xueming Qiu200.34
Kaiwen Luo300.34
Min Chen458.48
Bo Zhao500.34
Yihua Huang686.61