Title
Scalable topical phrase mining from text corpora
Abstract
While most topic modeling algorithms model text corpora with unigrams, human interpretation often relies on inherent grouping of terms into phrases. As such, we consider the problem of discovering topical phrases of mixed lengths. Existing work either performs post processing to the results of unigram-based topic models, or utilizes complex n-gram-discovery topic models. These methods generally produce low-quality topical phrases or suffer from poor scalability on even moderately-sized datasets. We propose a different approach that is both computationally efficient and effective. Our solution combines a novel phrase mining framework to segment a document into single and multi-word phrases, and a new topic model that operates on the induced document partition. Our approach discovers high quality topical phrases with negligible extra cost to the bag-of-words topic model in a variety of datasets including research publication titles, abstracts, reviews, and news articles.
Year
DOI
Venue
2014
10.14778/2735508.2735519
VLDB
DocType
Volume
Issue
Journal
8
3
ISSN
Citations 
PageRank 
Proceedings of the VLDB Endowment, Vol. 8(3), pp. 305 - 316, 2014
56
1.52
References 
Authors
19
5
Name
Order
Citations
PageRank
Ahmed El-Kishky1561.86
Yanglei Song2684.09
Chi Wang3201481.08
Clare R. Voss434429.51
Jiawei Han5430853824.48