Title
OSCAR: Orchestrating STT-RAM cache traffic for heterogeneous CPU-GPU architectures.
Abstract
As we integrate data-parallel GPUs with general-purpose CPUs on a single chip, the enormous cache traffic generated by GPUs will not only exhaust the limited cache capacity, but also severely interfere with CPU requests. Such heterogeneous multicores pose significant challenges to the design of shared last-level cache (LLC). This problem can be mitigated by replacing SRAM LLC with emerging non-volatile memories like Spin-Transfer Torque RAM (STT-RAM), which provides larger cache capacity and near-zero leakage power. However, without careful design, the slow write operations of STT-RAM may offset the capacity benefit, and the system may still suffer from contention in the shared LLC and on-chip interconnects. While there are cache optimization techniques to alleviate such problems, we reveal that the true potential of STT-RAM LLC may still be limited because now that the cache hit rate has been improved by the increased capacity, the on-chip network can become a performance bottleneck. CPU and GPU packets contend with each other for the shared network bandwidth. Moreover, the mixed-criticality read/write packets to STT-RAM add another layer of complexity to the network resource allocation. Therefore, being aware of the disparate latency tolerance of CPU/GPU applications and the asymmetric read/write latency of STT-RAM, we propose OSCAR to Orchestrate STT-RAM Caches traffic for heterogeneous ARchitectures. Specifically, an integration of asynchronous batch scheduling and priority based allocation for on-chip interconnect is proposed to maximize the potential of STT-RAM based LLC. Simulation results on a 28-GPU and 14-CPU system demonstrate an average of 17.4% performance improvement for CPUs, 10.8% performance improvement for GPUs, and 28.9% LLC energy saving compared to SRAM based LLC design.
Year
DOI
Venue
2016
10.5555/3195638.3195672
MICRO-49: The 49th Annual IEEE/ACM International Symposium on Microarchitecture Taipei Taiwan October, 2016
Keywords
Field
DocType
OSCAR,STT-RAM cache traffic,heterogeneous CPU-GPU architectures,shared last-level cache,spin-transfer torque RAM,near-zero leakage power,heterogeneous multicores,shared network bandwidth,network resource allocation
Bottleneck,Cache pollution,Cache,Computer science,Parallel computing,Static random-access memory,Cache algorithms,Real-time computing,Resource allocation,Work stealing,Cache coloring,Embedded system
Conference
ISSN
ISBN
Citations 
1072-4451
978-1-4503-4952-9
4
PageRank 
References 
Authors
0.41
31
5
Name
Order
Citations
PageRank
Jia Zhan1875.45
Onur Kayıran235613.47
Gabriel H. Loh32481134.10
Chita R. Das4103859.34
Yuan Xie56430407.00