Abstract | ||
---|---|---|
Training complex machine learning models in parallel is an increasingly important workload. We accelerate distributed parallel training by designing a communication primitive that uses a programmable switch dataplane to execute a key step of the training process. Our approach, SwitchML, reduces the volume of exchanged data by aggregating the model updates from multiple workers in the network. We co-design the switch processing with the end-host protocols and ML frameworks to provide a robust, efficient solution that speeds up training by up to 300%, and at least by 20% for a number of real-world benchmark models. |
Year | Venue | DocType |
---|---|---|
2019 | arXiv: Distributed, Parallel, and Cluster Computing | Journal |
Volume | Citations | PageRank |
abs/1903.06701 | 3 | 0.38 |
References | Authors | |
0 | 10 |
Name | Order | Citations | PageRank |
---|---|---|---|
Amedeo Sapio | 1 | 36 | 4.54 |
Marco Canini | 2 | 857 | 60.21 |
Chen-Yu Ho | 3 | 3 | 0.72 |
Jacob Nelson | 4 | 281 | 17.27 |
Panos Kalnis | 5 | 3297 | 141.30 |
Changhoon Kim | 6 | 1716 | 121.18 |
Arvind Krishnamurthy | 7 | 4540 | 312.24 |
Masoud Moshref | 8 | 263 | 13.73 |
Dan R. K. Ports | 9 | 445 | 22.52 |
Peter Richtárik | 10 | 1314 | 84.53 |