Title
Heterogeneity-aware Distributed Parameter Servers.
Abstract
We study distributed machine learning in heterogeneous environments in this work. We first conduct a systematic study of existing systems running distributed stochastic gradient descent; we find that, although these systems work well in homogeneous environments, they can suffer performance degradation, sometimes up to 10x, in heterogeneous environments where stragglers are common because their synchronization protocols cannot fit a heterogeneous setting. Our first contribution is a heterogeneity-aware algorithm that uses a constant learning rate schedule for updates before adding them to the global parameter. This allows us to suppress stragglers' harm on robust convergence. As a further improvement, our second contribution is a more sophisticated learning rate schedule that takes into consideration the delayed information of each update. We theoretically prove the valid convergence of both approaches and implement a prototype system in the production cluster of our industrial partner Tencent Inc. We validate the performance of this prototype using a range of machine-learning workloads. Our prototype is 2-12x faster than other state-of-the-art systems, such as Spark, Petuum, and TensorFlow; and our proposed algorithm takes up to 6x fewer iterations to converge.
Year
DOI
Venue
2017
10.1145/3035918.3035933
SIGMOD Conference
Field
DocType
Citations 
Convergence (routing),Data mining,Stochastic gradient descent,Synchronization,Spark (mathematics),Computer science,Homogeneous,Server,Database,Distributed computing
Conference
37
PageRank 
References 
Authors
1.12
39
4
Name
Order
Citations
PageRank
Jiawei Jiang18914.60
Bin Cui21843124.59
Ce Zhang380383.39
Lele Yu4706.93