Title | ||
---|---|---|
OC-DNN - Exploiting Advanced Unified Memory Capabilities in CUDA 9 and Volta GPUs for Out-of-Core DNN Training. |
Abstract | ||
---|---|---|
Existing frameworks cannot train large DNNs that do not fit the GPU memory without explicit memory management schemes. In this paper, we propose OC-DNN - a novel Out-of-Core DNN training framework that exploits new Unified Memory features along with new hardware mechanisms in Pascal and Volta GPUs. OC-DNN has two major design components — 1) OC-Caffe; an enhanced version of Caffe that exploits innovative UM features like asynchronous prefetching, managed page-migration, exploitation of GPU-based page faults, and the cudaMemAdvise interface to enable efficient out-of-core training for very large DNNs, and 2) an interception library to transpar-ently leverage these cutting-edge features for other frameworks. We provide a comprehensive performance characterization of our designs. OC-Caffe provides comparable performance (to Caffe) for regular DNNs. OC-Caffe-Opt is up to 1.9X faster than OC-Caffe-Naive and up to 5X faster than optimized CPU-based training for out-of-core workloads. OC-Caffe also allows scale-up (DGX-1) and scale-out on multi-GPU clusters. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1109/HiPC.2018.00024 | HiPC |
Keywords | Field | DocType |
Training,Graphics processing units,Memory management,Hardware,Prefetching,Resource management | Resource management,Asynchronous communication,Computer science,CUDA,Caffè,Parallel computing,Exploit,Out-of-core algorithm,Memory management,Page fault | Conference |
ISSN | ISBN | Citations |
1094-7256 | 978-1-5386-8386-6 | 3 |
PageRank | References | Authors |
0.48 | 0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ammar Ahmad Awan | 1 | 91 | 10.84 |
Ching-Hsiang Chu | 2 | 61 | 11.21 |
Hari Subramoni | 3 | 466 | 50.51 |
Xiaoyi Lu | 4 | 602 | 60.53 |
Dhabaleswar K. Panda | 5 | 5366 | 446.70 |