Title
CoNDA: efficient cache coherence support for near-data accelerators
Abstract
Specialized on-chip accelerators are widely used to improve the energy efficiency of computing systems. Recent advances in memory technology have enabled near-data accelerators (NDAs), which reside off-chip close to main memory and can yield further benefits than on-chip accelerators. However, enforcing coherence with the rest of the system, which is already a major challenge for accelerators, becomes more difficult for NDAs. This is because (1) the cost of communication between NDAs and CPUs is high, and (2) NDA applications generate a lot of off-chip data movement. As a result, as we show in this work, existing coherence mechanisms eliminate most of the benefits of NDAs. We extensively analyze these mechanisms, and observe that (1) the majority of off-chip coherence traffic is unnecessary, and (2) much of the off-chip traffic can be eliminated if a coherence mechanism has insight into the memory accesses performed by the NDA. Based on our observations, we propose CoNDA, a coherence mechanism that lets an NDA optimistically execute an NDA kernel, under the assumption that the NDA has all necessary coherence permissions. This optimistic execution allows CoNDA to gather information on the memory accesses performed by the NDA and by the rest of the system. CoNDA exploits this information to avoid performing unnecessary coherence requests, and thus, significantly reduces data movement for coherence. We evaluate CoNDA using state-of-the-art graph processing and hybrid in-memory database workloads. Averaged across all of our workloads operating on modest data set sizes, CoNDA improves performance by 19.6% over the highest-performance prior coherence mechanism (66.0%/51.7% over a CPU-only/NDA-only system) and reduces memory system energy consumption by 18.0% over the most energy-efficient prior coherence mechanism (43.7% over CPU-only). CoNDA comes within 10.4% and 4.4% of the performance and energy of an ideal mechanism with no cost for coherence. The benefits of CoNDA increase with large data sets, as CoNDA improves performance over the highest-performance prior coherence mechanism by 38.3% (8.4x/7.7x over CPU-only/NDA-only), and comes within 10.2% of an ideal no-cost coherence mechanism.
Year
DOI
Venue
2019
10.1145/3307650.3322266
Proceedings of the 46th International Symposium on Computer Architecture
Keywords
Field
DocType
computing systems,memory technology,off-chip data movement,off-chip coherence traffic,memory accesses,NDA kernel,graph processing,hybrid in-memory database workloads,memory system energy consumption,energy-efficient prior coherence mechanism,specialized on-chip accelerators,energy efficiency,CoNDA,cache coherence support,coherence permissions,coherence requests
Kernel (linear algebra),Graph,Data set,Computer science,Efficient energy use,Parallel computing,Exploit,Coherence (physics),Energy consumption,Computer engineering,Cache coherence
Conference
ISSN
ISBN
Citations 
1063-6897
978-1-4503-6669-4
12
PageRank 
References 
Authors
0.43
74
11
Name
Order
Citations
PageRank
Amirali Boroumand11555.20
Saugata Ghose271836.45
Minesh Patel32049.82
Hasan Hassan435217.76
Brandon Lucia569032.24
Rachata Ausavarungnirun678029.88
Kevin Hsieh722310.93
Nastaran Hajinazar8402.38
Krishna T. Malladi924918.37
Hongzhong Zheng101225.94
Onur Mutlu119446357.40