Title
TurboDL: Improving the CNN Training on GPU With Fine-Grained Multi-Streaming Scheduling
Abstract
Graphics Processing Units (GPUs) have evolved as powerful co-processors for the CNN training. Many new features have been introduced into GPUs such as concurrent kernel execution and hyper-Q technology. It is challenging to orchestrate concurrency for CNN (convolutional neural networks) training on GPUs since it may introduce synchronization overhead and poor resource utilization. Unlike previous research which mainly focuses on single layer or coarse-grained optimization, we introduce a critical-path based, asynchronous parallelization mechanism, and propose the optimization technique for the CNN training that takes into account global network architecture and GPU resource usage together. The proposed methods can effectively overlap the synchronization and the computation in different streams. As a result, the training process of CNN is accelerated. We have integrated our methods into Caffe. The experimental results show that the Caffe integrated with our methods can achieve 1.30X performance speedup on average compared with Caffe+cuDNN, and even higher performance speedup can be achieved for deeper, wider, and more complicated networks.
Year
DOI
Venue
2021
10.1109/TC.2020.2990321
IEEE Transactions on Computers
Keywords
DocType
Volume
Deep learning,parallelism optimization,scheduling,GPU
Journal
70
Issue
ISSN
Citations 
4
0018-9340
0
PageRank 
References 
Authors
0.34
0
5
Name
Order
Citations
PageRank
Hai Jin16544644.63
Wenchao Wu200.34
Xuanhua Shi357157.87
Ligang He454256.73
Bing B Zhou500.34