Title
The dual-sparse topic model: mining focused topics and focused terms in short text
Abstract
Topic modeling has been proved to be an effective method for exploratory text mining. It is a common assumption of most topic models that a document is generated from a mixture of topics. In real-world scenarios, individual documents usually concentrate on several salient topics instead of covering a wide variety of topics. A real topic also adopts a narrow range of terms instead of a wide coverage of the vocabulary. Understanding this sparsity of information is especially important for analyzing user-generated Web content and social media, which are featured as extremely short posts and condensed discussions. In this paper, we propose a dual-sparse topic model that addresses the sparsity in both the topic mixtures and the word usage. By applying a \"Spike and Slab\" prior to decouple the sparsity and smoothness of the document-topic and topic-word distributions, we allow individual documents to select a few focused topics and a topic to select focused terms, respectively. Experiments on different genres of large corpora demonstrate that the dual-sparse topic model outperforms both classical topic models and existing sparsity-enhanced topic models. This improvement is especially notable on collections of short documents.
Year
DOI
Venue
2014
10.1145/2566486.2567980
WWW
Keywords
Field
DocType
topic modeling,individual document,topic mixture,salient topic,real topic,classical topic model,short text,topic model,focused topic,dual-sparse topic model,sparsity-enhanced topic model,sparse representation,user generated content
Data mining,Word usage,Computer science,Artificial intelligence,User-generated content,World Wide Web,Social media,Information retrieval,Sparse approximation,Topic model,Vocabulary,Web content,Machine learning,Salient
Conference
Citations 
PageRank 
References 
28
0.81
26
Authors
4
Name
Order
Citations
PageRank
Tianyi Lin114711.79
Wentao Tian21464.23
Qiaozhu Mei34395207.09
Hong Cheng43694148.72