Title
PLDA+: Parallel latent dirichlet allocation with data placement and pipeline processing
Abstract
Previous methods of distributed Gibbs sampling for LDA run into either memory or communication bottlenecks. To improve scalability, we propose four strategies: data placement, pipeline processing, word bundling, and priority-based scheduling. Experiments show that our strategies significantly reduce the unparallelizable communication bottleneck and achieve good load balancing, and hence improve scalability of LDA.
Year
DOI
Venue
2011
10.1145/1961189.1961198
ACM TIST
Keywords
Field
DocType
pipeline processing,communication bottleneck,latent dirichlet allocation,unparallelizable communication bottleneck,gibbs sampling,good load balancing,parallel latent dirichlet allocation,previous method,priority-based scheduling,data placement,lda run,distributed parallel computations,topic models,load balance
Bottleneck,Latent Dirichlet allocation,Load balancing (computing),Computer science,Scheduling (computing),Parallel computing,Topic model,Gibbs sampling,Scalability,Distributed computing
Journal
Volume
Issue
ISSN
2
3
2157-6904
Citations 
PageRank 
References 
83
2.33
16
Authors
4
Name
Order
Citations
PageRank
Zhiyuan Liu12037123.68
Yuzhou Zhang21507.63
Edward Y. Chang34519336.59
Maosong Sun42293162.86