Title
Architecture and Compiler Support for GPUs Using Energy-Efficient Affine Register Files.
Abstract
A modern GPU can simultaneously process thousands of hardware threads. These threads are grouped into fixed-size SIMD batches executing the same instruction on vectors of data in a lockstep to achieve high throughput and performance. The register files are huge due to each SIMD group accessing a dedicated set of vector registers for fast context switching, and consequently the power consumption of register files has become an important issue. One proposed solution is to replace some of the vector registers by scalar registers, as different threads in a same SIMD group operate on scalar values and so the redundant computations and accesses of these scalar values can be eliminated. However, it has been observed that a significant number of registers containing affine vectors υ such that υ[i] = b + i × s can be represented by base b and stride s. Therefore, this article proposes an affine register file design for GPUs that is energy efficient due to it reducing the redundant executions of both the uniform and affine vectors. This design uses a pair of registers to store the base and stride of each affine vector and provides specific affine ALUs to execute affine instructions. A method of compiler analysis has been developed to detect scalars and affine vectors and annotate instructions for facilitating their corresponding scalar and affine computations. Furthermore, a priority-based register allocation scheme has been implemented to assign scalars and affine vectors to appropriate scalar and affine register files. Experimental results show that this design was able to dispatch 43.56% of the computations to scalar and affine ALUs when using eight scalar and four affine registers per warp. This resulted in the current design also reducing the energy consumption of the register files and ALUs to 21.86% and 26.54%, respectively, and it reduced the overall energy consumption of the GPU by an average of 5.18%.
Year
DOI
Venue
2018
10.1145/3133218
ACM Trans. Design Autom. Electr. Syst.
Keywords
Field
DocType
Energy efficient, GPU, register allocation, register file organization
Affine transformation,Register allocation,Computer science,Scalar (mathematics),Parallel computing,Register file,SIMD,Compiler,Thread (computing),Context switch
Journal
Volume
Issue
ISSN
23
2
1084-4309
Citations 
PageRank 
References 
2
0.46
20
Authors
5
Name
Order
Citations
PageRank
Shao-Chung Wang1134.31
Li-Chen Kan220.46
Chao-Lin Lee320.46
Yuan-Shin Hwang440340.55
Jenq Kuen Lee545948.71