Title
A More Time-Efficient Gibbs Sampling Algorithm Based On Sparselda For Latent Dirichlet Allocation
Abstract
As an efficient sampling algorithm for latent dirichlet allocation SparseLDA uses cache strategy to improve the time and space efficiency of its standard gibbs sampling algorithm (StdGibbs) by recycling previous computation. However, SparseLDA cannot further improve the time-efficiency of StdGibbs, since the amount of recycled computation is limited. This is because the word types of two adjacent tokens are usually different and the previous computation cannot be further recycled easily. To solve this problem, in this paper we propose a new algorithm named Efficient SparseLDA (ESparseLDA) based on SparseLDA. The main idea of ESparseLDA is to first rearrange the tokens within one text according to the word types so that the tokens of the same word type are aggregated together and then recycle more computation while making no approximation and ensuring the exactness. In this paper, we make detailed theoretical explanations and comparative experimental analyses on the correctness, exactness and time-efficiency of ESparseLDA. In detail, the statistical significance tests on perplexities strictly show that ESparseLDA is correct and exact. In addition, the running time results show that the time-efficiency of ESparseLDA is the higher than SparseLDA in varying degrees from 5.06% to 31.85% on the different datasets used in experiments.
Year
DOI
Venue
2018
10.3233/IDA-173609
INTELLIGENT DATA ANALYSIS
Keywords
Field
DocType
Latent dirichlet allocation, topic model, gibbs sampling, topic inference
Latent Dirichlet allocation,Computer science,Algorithm,Gibbs sampling
Journal
Volume
Issue
ISSN
22
6
1088-467X
Citations 
PageRank 
References 
0
0.34
26
Authors
3
Name
Order
Citations
PageRank
Xiaotang Zhou1194.08
Jihong OuYang29415.66
Ximing Li34413.97