Title
Understanding the Effects of DRAM Correctable Error Logging at Scale
Abstract
Fault tolerance poses a major challenge for future large-scale systems. Current research on fault tolerance has been principally focused on mitigating the impact of uncorrectable errors: errors that corrupt the state of the machine and require a restart from a known good state. However, correctable errors occur much more frequently than uncorrectable errors and may be even more common on future sy...
Year
DOI
Venue
2021
10.1109/Cluster48925.2021.00060
2021 IEEE International Conference on Cluster Computing (CLUSTER)
Keywords
DocType
ISSN
Fault tolerance,Conferences,Fault tolerant systems,Random access memory,Cluster computing,Hardware,Large-scale systems
Conference
1552-5244
ISBN
Citations 
PageRank 
978-1-7281-9666-4
0
0.34
References 
Authors
0
5
Name
Order
Citations
PageRank
Kurt Ferreira163940.78
Scott Levy2297.36
Victor Kuhns300.34
Nathan DeBardeleben449031.71
Sean Blanchard519013.20