Abstract | ||
---|---|---|
GPUs are bottlenecked by the off-chip communication bandwidth and its energy cost; hence near-data acceleration is particularly attractive for GPUs. Integrating the accelerators within DRAM can mitigate these bottlenecks and additionally expose them to the higher internal bandwidth of DRAM. However, such an integration is challenging, as it requires low-overhead accelerators while supporting a diverse set of applications. To enable the integration, this work leverages the approximability of GPU applications and utilizes the neural transformation, which converts diverse regions of code mainly to Multiply-Accumulate (MAC). Furthermore, to preserve the SIMT execution model of GPUs, we also propose a novel approximate MAC unit with a significantly smaller area overhead. As such, this work introduces AxRam---a novel DRAM architecture---that integrates several approximate MAC units. AxRam offers this integration without increasing the memory column pitch or modifying the internal architecture of the DRAM banks. Our results with 10 GPGPU benchmarks show that, on average, AxRam provides 2.6× speedup and 13.3× energy reduction over a baseline GPU with no acceleration. These benefits are achieved while reducing the overall DRAM system power by 26% with an area cost of merely 2.1%.
|
Year | DOI | Venue |
---|---|---|
2018 | 10.1145/3243176.3243188 | PACT |
Field | DocType | ISBN |
Dram,Computer science,Parallel computing,Transactional memory,Bandwidth (signal processing),Execution model,General-purpose computing on graphics processing units,Acceleration,Multi-core processor,Speedup | Conference | 978-1-4503-5986-3 |
Citations | PageRank | References |
4 | 0.41 | 54 |
Authors | ||
6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Amir Yazdanbakhsh | 1 | 241 | 15.28 |
Choungki Song | 2 | 11 | 1.71 |
Jacob Sacks | 3 | 20 | 2.74 |
Lotfi-Kamran, P. | 4 | 396 | 22.26 |
H. Esmaeilzadeh | 5 | 1443 | 69.71 |
Nam Sung Kim | 6 | 3268 | 225.99 |