Title
Efficient Methods For Inferring Large Sparse Topic Hierarchies
Abstract
Latent variable topic models such as Latent Dirichlet Allocation (LDA) can discover topics from text in an unsupervised fashion. However, scaling the models up to the many distinct topics exhibited in modern corpora is challenging. "Flat" topic models like LDA have difficulty modeling sparsely expressed topics, and richer hierarchical models become computationally intractable as the number of topics increases.In this paper, we introduce efficient methods for inferring large topic hierarchies. Our approach is built upon the Sparse Backoff Tree (SBT), a new prior for latent topic distributions that organizes the latent topics as leaves in a tree. We show how a document model based on SBTs can effectively infer accurate topic spaces of over a million topics. We introduce a collapsed sampler for the model that exploits sparsity and the tree structure in order to make inference efficient. In experiments with multiple data sets, we show that scaling to large topic spaces results in much more accurate models, and that SBT document models make use of large topic spaces more effectively than flat LDA.
Year
Venue
Field
2015
PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1
Latent Dirichlet allocation,Inference,Computer science,Exploit,Latent variable,Artificial intelligence,Tree structure,Natural language processing,Topic model,Hierarchy,Scaling,Machine learning
DocType
Volume
Citations 
Conference
P15-1
4
PageRank 
References 
Authors
0.39
16
3
Name
Order
Citations
PageRank
Doug Downey11908119.79
Chandra Sekhar Bhagavatula214114.46
Yi Yang3271.68