Title
Re-NUCA: Boosting CMP Performance Through Block Replication
Abstract
Chip Multiprocessor (CMP) systems have become the reference architecture for designing micro-processors, thanks to the improvements in semiconductor nanotechnology that have continuously provided a crescent number of faster and smaller per-chip transistors. The interests for CMPs grew up since classical techniques for boosting performance, e.g. the increase of clock frequency and the amount of work performed at each clock cycle, can no longer deliver to significant improvement due to energy constrains and wire delay effects. CMP systems generally adopt a large last-level-cache (LLC) (typically, L2 or L3) shared among all cores, and private L1 caches. As the miss resolution time for private caches depends on the response time of the LLC, which is wire-delay dominated, performance are affected by wire delay. NUCA caches have been proposed for single and multi core systems as a mechanism for tolerating wire-delay effects on the overall performance. In this paper, we introduce a novel NUCA architecture, called Re-NUCA, specifically suited for (but not limited to) CMPs in which cores are placed at different sides of the shared cache. The idea is to allow shared blocks to be replicated inside the shared cache, in order to avoid the limitations to performance improvements that arise in classical D-NUCA caches due to the conflict hit problem. Our results show that Re-NUCA outperforms D-NUCA of more then 5% on average, but for those applications that strongly suffer from the conflict hit problem we observe performance improvements up to 15%.
Year
DOI
Venue
2010
10.1109/DSD.2010.41
Digital System Design: Architectures, Methods and Tools
Keywords
Field
DocType
block replication,performance improvement,shared cache,classical technique,cmp system,conflict hit problem,overall performance,classical d-nuca,boosting cmp performance,nuca cache,l1 cache,clock cycle,chip,cache memory,coherence,reference architecture,protocols
Shared memory,CPU cache,Computer science,Parallel computing,Response time,Multiprocessing,Real-time computing,Reference architecture,Cycles per instruction,Multi-core processor,Clock rate,Embedded system
Conference
ISBN
Citations 
PageRank 
978-1-4244-7839-2
5
0.41
References 
Authors
15
4
Name
Order
Citations
PageRank
Pierfrancesco Foglia119219.01
Cosimo Antonio Prete228730.81
Marco Solinas3593.96
Giovanna Monni450.41