Title
AccUDNN: A GPU Memory Efficient Accelerator for Training Ultra-Deep Neural Networks
Abstract
With the implementation of mainstream DL frameworks, scarce GPU memory resource is the primary bottleneck that hinders the trainability and training efficiency of ultra-deep neural networks (UDNN). Prior memory optimization works focus on removing the trainability restriction but leave the training efficiency out of consideration. To fill the gap, we present "AccUDNN", an accelerator that aims to make full use of finite GPU memory resource to speed up the training process of UDNN in this paper. AccUDNN mainly includes two modules: memory optimizer and hyperparameter tuner. Memory optimizer develops a novel performance-model guided dynamic swap out/in strategy to meet trainability first and further remedy the efficiency degradation in other swapping strategies. Then, a hyperparameter tuner is designed to explore the efficiency-optimal minibatch size and the matched learning rate after applying the dynamic swapping strategy. Evaluations demonstrate that AccUDNN cuts down the GPU memory requirement of ResNet-152 from more than 24GB to 8GB. In turn, given 12GB GPU memory budget, the efficiency-optimal minibatch size can reach 4.2x larger than Caffe and finally improve the scaling efficiency (speedup) of 8 GPUs' cluster by 1.9x.
Year
DOI
Venue
2019
10.1109/ICCD46524.2019.00017
2019 IEEE 37th International Conference on Computer Design (ICCD)
Keywords
Field
DocType
ultra-deep neural network, GPU, memory optimization, speed up
Bottleneck,Hyperparameter,Computer science,CUDA,Caffè,Parallel computing,Artificial intelligence,Deep learning,Artificial neural network,Tuner,Speedup
Conference
ISSN
ISBN
Citations 
1063-6404
978-1-7281-1215-2
0
PageRank 
References 
Authors
0.34
14
8
Name
Order
Citations
PageRank
Jinrong Guo172.55
Wantao Liu2738.29
Lili Wang3165.48
Chunrong Yao431.76
Jizhong Han535554.72
Ruixuan Li640569.47
Yijun Lu7164.51
Songlin Hu812630.82