Title
Introducing hierarchy-awareness in replacement and bypass algorithms for last-level caches
Abstract
The replacement policies for the last-level caches (LLCs) are usually designed based on the access information available locally at the LLC. These policies are inherently sub-optimal due to lack of information about the activities in the inner-levels of the hierarchy. This paper introduces cache hierarchy-aware replacement (CHAR) algorithms for inclusive LLCs (or L3 caches) and applies the same algorithms to implement efficient bypass techniques for exclusive LLCs in a three-level hierarchy. In a hierarchy with an inclusive LLC, these algorithms mine the L2 cache eviction stream and decide if a block evicted from the L2 cache should be made a victim candidate in the LLC based on the access pattern of the evicted block. Ours is the first proposal that explores the possibility of using a subset of L2 cache eviction hints to improve the replacement algorithms of an inclusive LLC. The CHAR algorithm classifies the blocks residing in the L2 cache based on their reuse patterns and dynamically estimates the reuse probability of each class of blocks to generate selective replacement hints to the LLC. Compared to the static re-reference interval prediction (SRRIP) policy, our proposal offers an average reduction of 10.9% in LLC misses and an average improvement of 3.8% in instructions retired per cycle (IPC) for twelve single-threaded applications. The corresponding reduction in LLC misses for one hundred 4-way multi-programmed workloads is 6.8% leading to an average improvement of 3.9% in throughput. Finally, our proposal achieves an 11.1% reduction in LLC misses and a 4.2% reduction in parallel execution cycles for six 8-way threaded shared memory applications compared to the SRRIP policy. In a cache hierarchy with an exclusive LLC, our CHAR proposal offers an effective algorithm for selecting the subset of blocks (clean or dirty) evicted from the L2 cache that need not be written to the LLC and can be bypassed. Compared to the TC-AGE policy (analogue of SRRIP for exclusive LLC), our best exclusive LLC proposal improves average throughput by 3.2% while saving an average of 66.6% of data transactions from the L2 cache to the on-die interconnect for one hundred 4-way multi-programmed workloads. Compared to an inclusive LLC design with an identical hierarchy, this corresponds to an average throughput improvement of 8.2% with only 17% more data write transactions originating from the L2 cache.
Year
DOI
Venue
2012
10.1145/2370816.2370860
PACT
Keywords
Field
DocType
average improvement,last-level cache,inclusive llc design,best exclusive llc proposal,hundred 4-way multi-programmed workloads,l3 cache,l2 cache eviction hint,introducing hierarchy-awareness,l2 cache eviction stream,inclusive llc,l2 cache,exclusive llc
Computer science,Cache,CPU cache,Real-time computing,Throughput,Hierarchy,Shared memory,Cache pollution,Reuse,Parallel computing,Algorithm,Cache algorithms,Operating system
Conference
ISSN
ISBN
Citations 
1089-795X
978-1-5090-6609-4
23
PageRank 
References 
Authors
0.63
21
5
Name
Order
Citations
PageRank
Mainak Chaudhuri130018.86
Jayesh Gaur21086.98
Nithiyanandan Bashyam3230.63
Sreenivas Subramoney412713.60
Joseph Nuzman5753.89