Abstract | ||
---|---|---|
GPUs rely on large register files to unlock thread-level parallelism for high throughput. Unfortunately, large register files are power hungry, making it important to seek for new approaches to improve their utilization. This paper introduces a new register file organization for efficient register-packing of narrow integer and floating-point operands designed to leverage on advances in static analysis. We show that the hardware/software co-designed register file organization yields a performance improvement of up to 79%, and 18.6%, on average, at a modest output-quality degradation. |
Year | DOI | Venue |
---|---|---|
2020 | 10.1145/3404397.3404431 | ICPP |
DocType | Citations | PageRank |
Conference | 0 | 0.34 |
References | Authors | |
0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Angerd Alexandra | 1 | 0 | 0.34 |
Erik Sintorn | 2 | 262 | 20.06 |
Per Stenström | 3 | 3048 | 234.09 |