GPNPU: Enabling Efficient Hardware-Based Direct Convolution with Multi-Precision Support in GPU Tensor Cores - Citegraph

Paper Info

Title
GPNPU: Enabling Efficient Hardware-Based Direct Convolution with Multi-Precision Support in GPU Tensor Cores

Abstract
To tailor for DNN (Deep Neural Network) acceleration, GPU has migrated to new architectures such as NVIDIA Volta and Turing that incorporate dedicated Tensor Cores. Although good at GEMM (generic matrix-matrix multiplication), Tensor Cores still have inefficiency facing convolutions with certain layer structures. This paper proposes a GPNPU (General-Purpose Neural-network Processing Unit) architecture, which offers another option of direct convolution in GPU. It stitches the direct convolution dataflow into the Tensor Cores with little hardware support, and resorts to regulated data layout with stripe-mined convolution execution to achieve higher performance and power efficiency, while retaining the general programability as GPU. We further apply a unified core design to support varied operand types and precision for higher computing throughput. The evaluation shows that GPNPU can outperform Tensor Cores on typical DNNs by 1.4X for inference (FP16) and 1.2X for training with much reduced power. The INT8 performance even increases to 2.4X. Our study demonstrates that it is possible and appealing to refine the Tensor Cores for greater DNN acceleration, while conforming to GPU architecture for the programmability necessary in future DNN evolution.

Year	DOI	Venue
2020	10.1109/DAC18072.2020.9218566	2020 57th ACM/IEEE Design Automation Conference (DAC)
DocType	ISSN	ISBN
Conference	0738-100X	978-1-7281-1085-1
Citations	PageRank	References
0	0.34	0
Authors
7

Authors (7 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Zhuoran Song	1	9	4.39
jianfei wang	2	4	3.19
Li Tianjian	3	9	5.53
Li Jiang	4	286	31.86
Jing Ke	5	3	2.75
Xiaoyao Liang	6	585	45.81
Naifeng Jing	7	152	27.07

1