OSCAR: Orchestrating STT-RAM cache traffic for heterogeneous CPU-GPU architectures. - Citegraph

Paper Info

Title
OSCAR: Orchestrating STT-RAM cache traffic for heterogeneous CPU-GPU architectures.

Abstract
As we integrate data-parallel GPUs with general-purpose CPUs on a single chip, the enormous cache traffic generated by GPUs will not only exhaust the limited cache capacity, but also severely interfere with CPU requests. Such heterogeneous multicores pose significant challenges to the design of shared last-level cache (LLC). This problem can be mitigated by replacing SRAM LLC with emerging non-volatile memories like Spin-Transfer Torque RAM (STT-RAM), which provides larger cache capacity and near-zero leakage power. However, without careful design, the slow write operations of STT-RAM may offset the capacity benefit, and the system may still suffer from contention in the shared LLC and on-chip interconnects. While there are cache optimization techniques to alleviate such problems, we reveal that the true potential of STT-RAM LLC may still be limited because now that the cache hit rate has been improved by the increased capacity, the on-chip network can become a performance bottleneck. CPU and GPU packets contend with each other for the shared network bandwidth. Moreover, the mixed-criticality read/write packets to STT-RAM add another layer of complexity to the network resource allocation. Therefore, being aware of the disparate latency tolerance of CPU/GPU applications and the asymmetric read/write latency of STT-RAM, we propose OSCAR to Orchestrate STT-RAM Caches traffic for heterogeneous ARchitectures. Specifically, an integration of asynchronous batch scheduling and priority based allocation for on-chip interconnect is proposed to maximize the potential of STT-RAM based LLC. Simulation results on a 28-GPU and 14-CPU system demonstrate an average of 17.4% performance improvement for CPUs, 10.8% performance improvement for GPUs, and 28.9% LLC energy saving compared to SRAM based LLC design.

Year	DOI	Venue
2016	10.5555/3195638.3195672	MICRO-49: The 49th Annual IEEE/ACM International Symposium on Microarchitecture Taipei Taiwan October, 2016
Keywords	Field	DocType
OSCAR,STT-RAM cache traffic,heterogeneous CPU-GPU architectures,shared last-level cache,spin-transfer torque RAM,near-zero leakage power,heterogeneous multicores,shared network bandwidth,network resource allocation	Bottleneck,Cache pollution,Cache,Computer science,Parallel computing,Static random-access memory,Cache algorithms,Real-time computing,Resource allocation,Work stealing,Cache coloring,Embedded system	Conference
ISSN	ISBN	Citations
1072-4451	978-1-4503-4952-9	4
PageRank	References	Authors
0.41	31	5

Authors (5 rows)

Cited by (4 rows)

References (31 rows)

Name	Order	Citations	PageRank
Jia Zhan	1	87	5.45
Onur Kayıran	2	356	13.47
Gabriel H. Loh	3	2481	134.10
Chita R. Das	4	1038	59.34
Yuan Xie	5	6430	407.00

1