Abstract | ||
---|---|---|
Machine learning, graph analytics and sparse linear algebra-based applications are dominated by irregular memory accesses resulting from following edges in a graph or non-zero elements in a sparse matrix. These accesses have little temporal or spatial locality, and thus incur long memory stalls and large bandwidth requirements. A traditional streaming or striding prefetcher cannot capture these irregular access patterns. A majority of these irregular accesses come from indirect patterns of the form A[B[i]]. We propose an efficient hardware indirect memory prefetcher (IMP) to capture this access pattern and hide latency. We also propose a partial cacheline accessing mechanism for these prefetches to reduce the network and DRAM bandwidth pressure from the lack of spatial locality. Evaluated on 7 applications, IMP shows 56% speedup on average (up to 2.3×) compared to a baseline 64 core system with streaming prefetchers. This is within 23% of an idealized system. With partial cacheline accessing, we see another 9.4% speedup on average (up to 46.6%). |
Year | DOI | Venue |
---|---|---|
2015 | 10.1145/2830772.2830807 | MICRO |
Keywords | Field | DocType |
indirect memory prefetcher,IMP,machine learning,graph analytics,sparse linear algebra,irregular memory accesses,sparse matrix,non-zero elements,spatial locality,temporal locality,DRAM bandwidth pressure | Dram,Locality,Latency (engineering),Computer science,Parallel computing,Real-time computing,Bandwidth (signal processing),Multi-core processor,Sparse matrix,Branch predictor,Speedup | Conference |
ISBN | Citations | PageRank |
978-1-5090-6601-8 | 28 | 0.82 |
References | Authors | |
30 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Xiangyao Yu | 1 | 270 | 16.17 |
Christopher J. Hughes | 2 | 988 | 63.34 |
Nadathur Satish | 3 | 2020 | 99.88 |
Srinivas Devadas | 4 | 8606 | 1146.30 |