Heterogeneity-aware Distributed Parameter Servers. - Citegraph

Paper Info

Title
Heterogeneity-aware Distributed Parameter Servers.

Abstract
We study distributed machine learning in heterogeneous environments in this work. We first conduct a systematic study of existing systems running distributed stochastic gradient descent; we find that, although these systems work well in homogeneous environments, they can suffer performance degradation, sometimes up to 10x, in heterogeneous environments where stragglers are common because their synchronization protocols cannot fit a heterogeneous setting. Our first contribution is a heterogeneity-aware algorithm that uses a constant learning rate schedule for updates before adding them to the global parameter. This allows us to suppress stragglers' harm on robust convergence. As a further improvement, our second contribution is a more sophisticated learning rate schedule that takes into consideration the delayed information of each update. We theoretically prove the valid convergence of both approaches and implement a prototype system in the production cluster of our industrial partner Tencent Inc. We validate the performance of this prototype using a range of machine-learning workloads. Our prototype is 2-12x faster than other state-of-the-art systems, such as Spark, Petuum, and TensorFlow; and our proposed algorithm takes up to 6x fewer iterations to converge.

Year	DOI	Venue
2017	10.1145/3035918.3035933	SIGMOD Conference
Field	DocType	Citations
Convergence (routing),Data mining,Stochastic gradient descent,Synchronization,Spark (mathematics),Computer science,Homogeneous,Server,Database,Distributed computing	Conference	37
PageRank	References	Authors
1.12	39	4

Authors (4 rows)

Cited by (37 rows)

References (39 rows)

Name	Order	Citations	PageRank
Jiawei Jiang	1	89	14.60
Bin Cui	2	1843	124.59
Ce Zhang	3	803	83.39
Lele Yu	4	70	6.93

1