TurboDL: Improving the CNN Training on GPU With Fine-Grained Multi-Streaming Scheduling - Citegraph

Paper Info

Title
TurboDL: Improving the CNN Training on GPU With Fine-Grained Multi-Streaming Scheduling

Abstract
Graphics Processing Units (GPUs) have evolved as powerful co-processors for the CNN training. Many new features have been introduced into GPUs such as concurrent kernel execution and hyper-Q technology. It is challenging to orchestrate concurrency for CNN (convolutional neural networks) training on GPUs since it may introduce synchronization overhead and poor resource utilization. Unlike previous research which mainly focuses on single layer or coarse-grained optimization, we introduce a critical-path based, asynchronous parallelization mechanism, and propose the optimization technique for the CNN training that takes into account global network architecture and GPU resource usage together. The proposed methods can effectively overlap the synchronization and the computation in different streams. As a result, the training process of CNN is accelerated. We have integrated our methods into Caffe. The experimental results show that the Caffe integrated with our methods can achieve 1.30X performance speedup on average compared with Caffe+cuDNN, and even higher performance speedup can be achieved for deeper, wider, and more complicated networks.

Year	DOI	Venue
2021	10.1109/TC.2020.2990321	IEEE Transactions on Computers
Keywords	DocType	Volume
Deep learning,parallelism optimization,scheduling,GPU	Journal	70
Issue	ISSN	Citations
4	0018-9340	0
PageRank	References	Authors
0.34	0	5

Authors (5 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Hai Jin	1	6544	644.63
Wenchao Wu	2	0	0.34
Xuanhua Shi	3	571	57.87
Ligang He	4	542	56.73
Bing B Zhou	5	0	0.34

1