Title
Communication Efficient SGD via Gradient Sampling with Bayes Prior
Abstract
Gradient compression has been widely adopted in data-parallel distributed training of deep neural networks to reduce communication overhead. Some literatures have demonstrated that large gradients are more important than small ones because they contain more information, such as Top-k compressor. Other mainstream methods, like random-k compressor and gradient quantization, usually treat all gradients equally. Different from all of them, we regard large and small gradients selection as the exploitation and exploration of gradient information, respectively. And we find taking both of them into consideration is the key to boost the final accuracy. So, we propose a novel gradient compressor: Gradient Sampling with Bayes Prior in this paper. Specifically, we sample important/large gradients based on the global gradient distribution, which is periodically updated across multiple workers. Then we introduce Bayes Prior into distribution model to further explore the gradients. We prove the convergence of our method for smooth non-convex problems in the distributed system. Compared with methods that running after high compression ratio at the expense of accuracy, we pursue no loss of accuracy and the actual acceleration benefit in practice. Experimental comparisons on a variety of computer vision tasks (e.g. image classification and object detection) and backbones (ResNet, MobileNetV2, InceptionV3 and AlexNet) show that our approach outperforms the state-of-the-art techniques in terms of both speed and accuracy, with the limitation of 100x compression ratio.
Year
DOI
Venue
2021
10.1109/CVPR46437.2021.01189
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021
DocType
ISSN
Citations 
Conference
1063-6919
0
PageRank 
References 
Authors
0.34
0
7
Name
Order
Citations
PageRank
Liuyihan Song142.15
Kang Zhao2205.11
Pan Pan334.16
Yu Liu419825.45
Yingya Zhang500.68
Yinghui Xu617220.23
Rong Jin76206334.26