Efficient Methods For Inferring Large Sparse Topic Hierarchies - Citegraph

Paper Info

Title
Efficient Methods For Inferring Large Sparse Topic Hierarchies

Abstract
Latent variable topic models such as Latent Dirichlet Allocation (LDA) can discover topics from text in an unsupervised fashion. However, scaling the models up to the many distinct topics exhibited in modern corpora is challenging. "Flat" topic models like LDA have difficulty modeling sparsely expressed topics, and richer hierarchical models become computationally intractable as the number of topics increases.In this paper, we introduce efficient methods for inferring large topic hierarchies. Our approach is built upon the Sparse Backoff Tree (SBT), a new prior for latent topic distributions that organizes the latent topics as leaves in a tree. We show how a document model based on SBTs can effectively infer accurate topic spaces of over a million topics. We introduce a collapsed sampler for the model that exploits sparsity and the tree structure in order to make inference efficient. In experiments with multiple data sets, we show that scaling to large topic spaces results in much more accurate models, and that SBT document models make use of large topic spaces more effectively than flat LDA.

Year	Venue	Field
2015	PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1	Latent Dirichlet allocation,Inference,Computer science,Exploit,Latent variable,Artificial intelligence,Tree structure,Natural language processing,Topic model,Hierarchy,Scaling,Machine learning
DocType	Volume	Citations
Conference	P15-1	4
PageRank	References	Authors
0.39	16	3

Authors (3 rows)

Cited by (4 rows)

References (16 rows)

Name	Order	Citations	PageRank
Doug Downey	1	1908	119.79
Chandra Sekhar Bhagavatula	2	141	14.46
Yi Yang	3	27	1.68

1