Title
A Simple Cache Coherence Scheme For Integrated Cpu-Gpu Systems
Abstract
This paper presents a novel approach to accelerate applications running on integrated CPU-GPU systems. Many integrated CPUGPU systems use cache-coherent shared memory to communicate. For example, after CPU produces data for GPU, the GPU may pull the data into its cache when it accesses the data. In such a pull-based approach, data resides in a shared cache until the GPU accesses it, resulting in long load latency on a first GPU access to a cache line. In this work, we propose a new, push-based, coherence mechanism that explicitly exploits the CPU and GPU producer-consumer relationship by automatically moving data from CPU to GPU last-level cache. The proposed mechanism results in a dramatic reduction of the GPU L2 cache miss rate in general, and a consequent increase in overall performance. Our experiments show that the proposed scheme can increase performance by up to 37%, with typical improvements in the 5-7% range. We find that even when tested applications do not benefit from the proposed approach, their performance does not decrease with our technique. While we demonstrate how the proposed scheme can co-exist with traditional cache coherence mechanisms, we argue that it could also be used as a simpler replacement for existing protocols.
Year
DOI
Venue
2020
10.1109/DAC18072.2020.9218664
PROCEEDINGS OF THE 2020 57TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC)
Keywords
DocType
ISSN
Cache coherence, GPU, Integrated CPU/GPU
Conference
0738-100X
Citations 
PageRank 
References 
0
0.34
0
Authors
4
Name
Order
Citations
PageRank
Ardhi Wiratama Baskara Yudha111.71
Reza Pulungan2798.84
Henry Hoffmann3177295.10
Yan Solihin42057111.56