Abstract | ||
---|---|---|
This paper presents a novel approach to accelerate applications running on integrated CPU-GPU systems. Many integrated CPUGPU systems use cache-coherent shared memory to communicate. For example, after CPU produces data for GPU, the GPU may pull the data into its cache when it accesses the data. In such a pull-based approach, data resides in a shared cache until the GPU accesses it, resulting in long load latency on a first GPU access to a cache line. In this work, we propose a new, push-based, coherence mechanism that explicitly exploits the CPU and GPU producer-consumer relationship by automatically moving data from CPU to GPU last-level cache. The proposed mechanism results in a dramatic reduction of the GPU L2 cache miss rate in general, and a consequent increase in overall performance. Our experiments show that the proposed scheme can increase performance by up to 37%, with typical improvements in the 5-7% range. We find that even when tested applications do not benefit from the proposed approach, their performance does not decrease with our technique. While we demonstrate how the proposed scheme can co-exist with traditional cache coherence mechanisms, we argue that it could also be used as a simpler replacement for existing protocols. |
Year | DOI | Venue |
---|---|---|
2020 | 10.1109/DAC18072.2020.9218664 | PROCEEDINGS OF THE 2020 57TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC) |
Keywords | DocType | ISSN |
Cache coherence, GPU, Integrated CPU/GPU | Conference | 0738-100X |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ardhi Wiratama Baskara Yudha | 1 | 1 | 1.71 |
Reza Pulungan | 2 | 79 | 8.84 |
Henry Hoffmann | 3 | 1772 | 95.10 |
Yan Solihin | 4 | 2057 | 111.56 |