Abstract | ||
---|---|---|
Processing-in-memory (PIM) architecture is promising for accelerating deep neural network (DNN) training due to its low-latency and energy-efficient data movement between computation units and the memory. This paper explores a novel GPU-PIM architecture for DNN training, where streaming multiprocessors of GPU are integrated into the logic layer of 3D memory stack, and multiple such stacks are connected to form a PIM-network. Two corresponding optimization strategies are proposed. The first is to increase the computational parallelism of the data-parallel training mode with the large memory, high bandwidth/high network transmission speed of GPU-PIM. The second is further utilizing the optimized model-parallel training to significantly reduce the communication overhead: We propose a mapping scheme to decide the proper parallelization for different DNN layers on the proposed architecture. Experiments show that the proposed architecture outperforms the baseline GPU by 35.5% and 59.9% and reduces energy consumption by 28.2% and 27.8% for the two benchmarks we evaluated. |
Year | DOI | Venue |
---|---|---|
2022 | 10.1109/CCGrid54584.2022.00051 | 2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid) |
Keywords | DocType | ISBN |
Hybrid Memory Cube,GPU,processing-in-memory,deep neural network | Conference | 978-1-6654-9957-6 |
Citations | PageRank | References |
0 | 0.34 | 20 |
Authors | ||
5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Xiang Fei | 1 | 0 | 0.34 |
Jianhui Han | 2 | 0 | 0.34 |
Jianqiang Huang | 3 | 0 | 0.34 |
Weimin Zheng | 4 | 1889 | 182.48 |
Youhui Zhang | 5 | 202 | 28.36 |