Title
HarpGBDT: Optimizing Gradient Boosting Decision Tree for Parallel Efficiency
Abstract
Gradient Boosting Decision Tree (GBDT) is a widely used machine learning algorithm, whose training involves both irregular computation and random memory access and is challenging for system optimizations. In this paper, we conduct a comprehensive performance analysis of two state-of-the-art systems, XGBoost and LightGBM. They represent two typical parallel implementations for GBDT; one is data parallel and the other one is parallel over features. Substantial thread synchronization overhead, as well as the inefficiency of random memory access, is identified. We propose HarpGBDT, a new GBDT system designed from the perspective of parallel efficiency optimization. Firstly, we adopt a new tree growth method that selects the top K candidates of tree nodes to enable the use of more levels of parallelism without sacrificing the algorithm's accuracy. Secondly, we organize the training data and model data in blocks and propose a block-wise approach as a general model that enables the exploration of various parallelism options. Thirdly, we propose a mixed mode to utilize the advantages of a different mode of parallelism in different phases of training. By changing the configuration of the block size and parallel mode, HarpGBDT is able to attain better parallel efficiency. By extensive experiments on four datasets with different statistical characteristics on the Intel(R) Xeon(R) E5-2699 server, HarpGBDT on average performs 8x faster than XGBoost and 2.6x faster than LightGBM.
Year
DOI
Venue
2019
10.1109/CLUSTER.2019.8890990
2019 IEEE International Conference on Cluster Computing (CLUSTER)
Keywords
Field
DocType
Machine learning algorithms,Parallel algorithms,Performance evaluation,Multithreading
Block size,Histogram,Decision tree,Data modeling,Computer science,Parallel computing,Boosting (machine learning),Xeon,Synchronization (computer science),Computation
Conference
ISSN
ISBN
Citations 
1552-5244
978-1-7281-4735-2
0
PageRank 
References 
Authors
0.34
14
10
Name
Order
Citations
PageRank
Bo Peng192.91
Judy Qiu232.07
Langshi Chen311.70
Jiayu Li401.01
Miao Jiang5121.66
Selahattin Akkas600.34
Egor Smirnov700.34
Ruslan Israfilov800.68
Sergey Khekhnev900.34
Andrey Nikolaev1001.01