Title | ||
---|---|---|
Architecture and Compiler Support for GPUs Using Energy-Efficient Affine Register Files. |
Abstract | ||
---|---|---|
A modern GPU can simultaneously process thousands of hardware threads. These threads are grouped into fixed-size SIMD batches executing the same instruction on vectors of data in a lockstep to achieve high throughput and performance. The register files are huge due to each SIMD group accessing a dedicated set of vector registers for fast context switching, and consequently the power consumption of register files has become an important issue. One proposed solution is to replace some of the vector registers by scalar registers, as different threads in a same SIMD group operate on scalar values and so the redundant computations and accesses of these scalar values can be eliminated. However, it has been observed that a significant number of registers containing affine vectors υ such that υ[i] = b + i × s can be represented by base b and stride s. Therefore, this article proposes an affine register file design for GPUs that is energy efficient due to it reducing the redundant executions of both the uniform and affine vectors. This design uses a pair of registers to store the base and stride of each affine vector and provides specific affine ALUs to execute affine instructions. A method of compiler analysis has been developed to detect scalars and affine vectors and annotate instructions for facilitating their corresponding scalar and affine computations. Furthermore, a priority-based register allocation scheme has been implemented to assign scalars and affine vectors to appropriate scalar and affine register files. Experimental results show that this design was able to dispatch 43.56% of the computations to scalar and affine ALUs when using eight scalar and four affine registers per warp. This resulted in the current design also reducing the energy consumption of the register files and ALUs to 21.86% and 26.54%, respectively, and it reduced the overall energy consumption of the GPU by an average of 5.18%.
|
Year | DOI | Venue |
---|---|---|
2018 | 10.1145/3133218 | ACM Trans. Design Autom. Electr. Syst. |
Keywords | Field | DocType |
Energy efficient, GPU, register allocation, register file organization | Affine transformation,Register allocation,Computer science,Scalar (mathematics),Parallel computing,Register file,SIMD,Compiler,Thread (computing),Context switch | Journal |
Volume | Issue | ISSN |
23 | 2 | 1084-4309 |
Citations | PageRank | References |
2 | 0.46 | 20 |
Authors | ||
5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Shao-Chung Wang | 1 | 13 | 4.31 |
Li-Chen Kan | 2 | 2 | 0.46 |
Chao-Lin Lee | 3 | 2 | 0.46 |
Yuan-Shin Hwang | 4 | 403 | 40.55 |
Jenq Kuen Lee | 5 | 459 | 48.71 |