Abstract | ||
---|---|---|
Topic modeling has been proved to be an effective method for exploratory text mining. It is a common assumption of most topic models that a document is generated from a mixture of topics. In real-world scenarios, individual documents usually concentrate on several salient topics instead of covering a wide variety of topics. A real topic also adopts a narrow range of terms instead of a wide coverage of the vocabulary. Understanding this sparsity of information is especially important for analyzing user-generated Web content and social media, which are featured as extremely short posts and condensed discussions. In this paper, we propose a dual-sparse topic model that addresses the sparsity in both the topic mixtures and the word usage. By applying a \"Spike and Slab\" prior to decouple the sparsity and smoothness of the document-topic and topic-word distributions, we allow individual documents to select a few focused topics and a topic to select focused terms, respectively. Experiments on different genres of large corpora demonstrate that the dual-sparse topic model outperforms both classical topic models and existing sparsity-enhanced topic models. This improvement is especially notable on collections of short documents. |
Year | DOI | Venue |
---|---|---|
2014 | 10.1145/2566486.2567980 | WWW |
Keywords | Field | DocType |
topic modeling,individual document,topic mixture,salient topic,real topic,classical topic model,short text,topic model,focused topic,dual-sparse topic model,sparsity-enhanced topic model,sparse representation,user generated content | Data mining,Word usage,Computer science,Artificial intelligence,User-generated content,World Wide Web,Social media,Information retrieval,Sparse approximation,Topic model,Vocabulary,Web content,Machine learning,Salient | Conference |
Citations | PageRank | References |
28 | 0.81 | 26 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Tianyi Lin | 1 | 147 | 11.79 |
Wentao Tian | 2 | 146 | 4.23 |
Qiaozhu Mei | 3 | 4395 | 207.09 |
Hong Cheng | 4 | 3694 | 148.72 |