Title | ||
---|---|---|
Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes. |
Abstract | ||
---|---|---|
Synchronized stochastic gradient descent (SGD) optimizers with data parallelism are widely used in training large-scale deep neural networks. Although using larger mini-batch sizes can improve the system scalability by reducing the communication-to-computation ratio, it may hurt the generalization ability of the models. To this end, we build a highly scalable deep learning training system for dense GPU clusters with three main contributions: (1) We propose a mixed-precision training method that significantly improves the training throughput of a single GPU without losing accuracy. (2) We propose an optimization approach for extremely large mini-batch size (up to 64k) that can train CNN models on the ImageNet dataset without losing accuracy. (3) We propose highly optimized all-reduce algorithms that achieve up to 3x and 11x speedup on AlexNet and ResNet-50 respectively than NCCL-based training on a cluster with 1024 Tesla P40 GPUs. On training ResNet-50 with 90 epochs, the state-of-the-art GPU-based system with 1024 Tesla P100 GPUs spent 15 minutes and achieved 74.9% top-1 test accuracy, and another KNL-based system with 2048 Intel KNLs spent 20 minutes and achieved 75.4% accuracy. Our training system can achieve 75.8% top-1 test accuracy in only 6.6 minutes using 2048 Tesla P40 GPUs. When training AlexNet with 95 epochs, our system can achieve 58.7% top-1 test accuracy within 4 minutes, which also outperforms all other existing systems. |
Year | Venue | Field |
---|---|---|
2018 | arXiv: Learning | Mixed precision,Stochastic gradient descent,Training system,Parallel computing,Data parallelism,Artificial intelligence,Throughput,Deep learning,Machine learning,Mathematics,Speedup,Scalability |
DocType | Volume | Citations |
Journal | abs/1807.11205 | 26 |
PageRank | References | Authors |
0.82 | 17 | 14 |
Name | Order | Citations | PageRank |
---|---|---|---|
Xianyan Jia | 1 | 26 | 1.49 |
Shutao Song | 2 | 26 | 0.82 |
Wei He | 3 | 29 | 10.01 |
Wang Yangzihao | 4 | 178 | 7.24 |
Haidong Rong | 5 | 26 | 0.82 |
Feihu Zhou | 6 | 26 | 1.15 |
Liqiang Xie | 7 | 26 | 0.82 |
Zhenyu Guo | 8 | 26 | 1.49 |
Yuanzhou Yang | 9 | 26 | 0.82 |
Liwei Yu | 10 | 26 | 1.15 |
Tiegang Chen | 11 | 26 | 0.82 |
Guangxiao Hu | 12 | 26 | 0.82 |
Shaohuai Shi | 13 | 41 | 4.62 |
Xiaowen Chu | 14 | 1273 | 101.81 |