HarpLDA+: Optimizing latent dirichlet allocation for parallel efficiency - Citegraph

Paper Info

Title
HarpLDA+: Optimizing latent dirichlet allocation for parallel efficiency

Abstract
Latent Dirichlet Allocation (LDA) is a widely used machine learning technique in topic modeling and data analysis. Training large LDA models on big datasets involves dynamic and irregular computation patterns and is a major challenge to both algorithm optimization and system design. In this paper, we present a comprehensive benchmarking of our novel synchronized LDA training system HarpLDA+ based on Hadoop and Java. It demonstrates impressive performance when compared to three other MPI/C++ based state-of-the-art systems, which are LightLDA, F+NomadLDA, and WarpLDA. HarpLDA+ uses optimized collective communication with a timer control for load balance, leading to stable scalability in both shared-memory and distributed systems. We demonstrate in the experiments that HarpLDA+ is effective in reducing synchronization and communication overhead and outperforms the other three LDA training systems.

Year	DOI	Venue
2017	10.1109/BigData.2017.8257932	2017 IEEE International Conference on Big Data (Big Data)
Keywords	DocType	ISSN
parallel efficiency,latent dirichlet allocation,topic modeling,data analysis,LDA models,big datasets,irregular computation patterns,algorithm optimization,system design,comprehensive benchmarking,MPI/C++ based state-of-the-art systems,shared-memory,distributed systems,LDA training systems,machine learning technique,HarpLDA+,dynamic computation patterns,LDA training system,load balancing,Hadoop,Java	Conference	2639-1589
ISBN	Citations	PageRank
978-1-5386-2716-7	1	0.36
References	Authors
0	12

Authors (12 rows)

Cited by (1 rows)

References (0 rows)

Name	Order	Citations	PageRank
Bo Peng	1	9	2.91
Bingjing Zhang	2	521	25.17
Langshi Chen	3	1	0.36
Mihai Avram	4	1	0.70
Robert Henschel	5	106	10.85
Craig A. Stewart	6	259	42.68
Shaojuan Zhu	7	1	0.36
Emily Mccallum	8	1	0.36
Lisa Smith	9	1	0.36
Tom Zahniser	10	1	0.36
Jon Omer	11	1	0.36
Judy Qiu	12	3	2.07

1