Title
A Pseudo-document-based Topical N-grams model for short texts
Abstract
In recent years, short text topic modeling has drawn considerable attentions from interdisciplinary researchers. Various customized topic models have been proposed to tackle the semantic sparseness nature of short texts. Most (if not all) of them follow thebag-of-wordsassumption, which, however, is not adequate since word order and phrases are often critical to capturing the meaning of texts. On the other hand, while some existing topic models are sensitive to word order, they do not perform well on short texts due to the severe data sparseness. To address these issues, we propose the Pseudo-document-based Topical N-Grams model (PTNG), which alleviates the data sparsity problem of short texts while is sensitive to word order. Extensive experiments on three real-world data sets with state-of-the-art baselines demonstrate the high quality of topics learned by PTNG according to UCI coherence scores and more discriminative semantic representation of short texts according to classification results.
Year
DOI
Venue
2020
10.1007/s11280-020-00814-x
WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS
Keywords
DocType
Volume
Short text,Topic model,Word order,Topical N-Grams
Journal
23.0
Issue
ISSN
Citations 
6
1386-145X
0
PageRank 
References 
Authors
0.34
30
6
Name
Order
Citations
PageRank
Hao Lin1433.50
Yuan Zuo2675.34
Guannan Liu300.68
Hong Li4381.72
Junjie Wu581237.97
Zhiang Wu635937.24