Title
Cardio: CMP Adaptation for Reliability Through Dynamic Introspective Operation
Abstract
modern digital system includes in a single chip many components: processing cores, large caches, memory controllers, and hardware accelerators. Looking forward, future semiconductor technologies will enable even higher device integration, overall increasing system performance while reducing energy consumption. Unfortunately, prominent experts agree that such technologies will be prone to both permanent and transient faults within their lifetime. With the goal of addressing this issue, we propose Cardio: a low-cost architecture for reliable chip multiprocessors. Our solution is based on a novel hardware/software co-design where silicon failures are detected in hardware and system reconfiguration is managed in software. Comparing Cardio with a state-of-the-art hardware-based resiliency solution, Immunet, we found that our design can achieve a comparable fault response time while requiring a much lower area overhead. The proposed solution relies on a distributed resource manager to collect information about a CMP component's health, and leverages a synchronized distributed control mechanism to recover from permanent failures. Such architecture can operate as long as at least one general-purpose processor is still functional. Our experimental evaluation indicates that the overall performance impact of Cardio is as low as 4.5%, and its dynamic reconfiguration time upon fault detection is comprised between 20 and 50 thousand cycles.
Year
DOI
Venue
2014
10.1109/TCAD.2013.2284008
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Keywords
DocType
Volume
Hardware,Circuit faults,Software,Runtime,Software reliability,Routing
Journal
33
Issue
ISSN
Citations 
2
0278-0070
1
PageRank 
References 
Authors
0.35
33
2
Name
Order
Citations
PageRank
Andrea Pellegrini1623.55
Valeria Bertacco2136586.93