Title
More influence means less work: fast latent dirichlet allocation by influence scheduling
Abstract
There have recently been considerable advances in fast inference for (online) latent Dirichlet allocation (LDA). While it is widely recognized that the scheduling of documents in stochastic optimization and in turn in LDA may have significant consequences, this issue remains largely unexplored. Instead, practitioners schedule documents essentially uniformly at random, due perhaps to ease of implementation, and to the lack of clear guidelines on scheduling the documents. In this work, we address this issue and propose to schedule documents for an update that exert a disproportionately large influence on the topics of the corpus before less influential ones. More precisely, we justify to sample documents randomly biased towards those ones with higher norms to form mini-batches. On several real-world datasets, including 3M articles from Wikipedia and 8M from PubMed, we demonstrate that the resulting influence scheduled LDA can handily analyze massive document collections and find topic models as good or better than those found with online LDA, often at a fraction of time.
Year
DOI
Venue
2011
10.1145/2063576.2063944
CIKM
Keywords
Field
DocType
fast latent dirichlet allocation,large influence,online lda,considerable advance,practitioners schedule document,massive document collection,clear guideline,higher norm,latent dirichlet allocation,fast inference,resulting influence,influence scheduling,stochastic optimization
Dynamic topic model,Data mining,Latent Dirichlet allocation,Stochastic optimization,Information retrieval,Computer science,Inference,Scheduling (computing),Artificial intelligence,Topic model,Machine learning
Conference
Citations 
PageRank 
References 
3
0.42
8
Authors
4
Name
Order
Citations
PageRank
Mirwaes Wahabzada11037.41
Kristian Kersting21932154.03
Anja Pilz3322.83
Christian Bauckhage41979195.86