Title
Fast collapsed gibbs sampling for latent dirichlet allocation
Abstract
In this paper we introduce a novel collapsed Gibbs sampling method for the widely used latent Dirichlet allocation (LDA) model. Our new method results in significant speedups on real world text corpora. Conventional Gibbs sampling schemes for LDA require O(K) operations per sample where K is the number of topics in the model. Our proposed method draws equivalent samples but requires on average significantly less then K operations per sample. On real-word corpora FastLDA can be as much as 8 times faster than the standard collapsed Gibbs sampler for LDA. No approximations are necessary, and we show that our fast sampling scheme produces exactly the same results as the standard (but slower) sampling scheme. Experiments on four real world data sets demonstrate speedups for a wide range of collection sizes. For the PubMed collection of over 8 million documents with a required computation time of 6 CPU months for LDA, our speedup of 5.7 can save 5 CPU months of computation.
Year
DOI
Venue
2008
10.1145/1401890.1401960
KDD
Keywords
Field
DocType
conventional gibbs,latent dirichlet allocation,gibbs sampling method,collapsed gibbs,pubmed collection,fast sampling scheme,new method result,k operation,sampling scheme,gibbs sampler,cpu month,sampling,gibbs sampling
Slice sampling,Latent Dirichlet allocation,Data set,Computer science,Sampling (statistics),Artificial intelligence,Machine learning,Gibbs sampling,Speedup,Computation,Sampling scheme
Conference
Citations 
PageRank 
References 
214
13.13
16
Authors
6
Search Limit
100214
Name
Order
Citations
PageRank
Ian Porteous126619.14
David Newman2131973.72
Alexander T. Ihler31377112.01
Arthur Asuncion478552.90
Padhraic Smyth571481451.38
Max Welling64875550.34