Title
Stochastic Variational Inference-Based Parallel and Online Supervised Topic Model for Large-Scale Text Processing.
Abstract
Topic modeling is a mainstream and effective technology to deal with text data, with wide applications in text analysis, natural language, personalized recommendation, computer vision, etc. Among all the known topic models, supervised Latent Dirichlet Allocation (sLDA) is acknowledged as a popular and competitive supervised topic model. However, the gradual increase of the scale of datasets makes sLDA more and more inefficient and time-consuming, and limits its applications in a very narrow range. To solve it, a parallel online sLDA, named PO-sLDA (Parallel and Online sLDA), is proposed in this study. It uses the stochastic variational inference as the learning method to make the training procedure more rapid and efficient, and a parallel computing mechanism implemented via the MapReduce framework is proposed to promote the capacity of cloud computing and big data processing. The online training capacity supported by PO-sLDA expands the application scope of this approach, making it instrumental for real-life applications with high real-time demand. The validation using two datasets with different sizes shows that the proposed approach has the comparative accuracy as the sLDA and can efficiently accelerate the training procedure. Moreover, its good convergence and online training capacity make it lucrative for the large-scale text data analyzing and processing.
Year
DOI
Venue
2018
10.1007/s11390-018-1871-y
J. Comput. Sci. Technol.
Keywords
Field
DocType
topic modeling, large-scale text classification, stochastic variational inference, cloud computing, online learning
Big data processing,Convergence (routing),Latent Dirichlet allocation,Computer science,Inference,Natural language,Artificial intelligence,Topic model,Machine learning,Text processing,Distributed computing,Cloud computing
Journal
Volume
Issue
ISSN
33
5
1000-9000
Citations 
PageRank 
References 
0
0.34
15
Authors
3
Name
Order
Citations
PageRank
Yang Li161.48
Wenzhuo Song252.77
Bo Yang382264.08