Abstract | ||
---|---|---|
This paper proposes a continuous health-check approach for detecting Silent Data Corruption (SCD) in High Performance Computing (HPC) environments. The goal is to minimize the effect of hardware errors in the overall reliability and accuracy of the system by overseeing and validating the accuracy of data. Our work focuses on comparing and presenting the advantages and shortcomings of two approaches to overcoming SDC. Our research shows that from the two proposed methods - threshold triggered and continuous verification - the latter is superior in terms of latency.
|
Year | DOI | Venue |
---|---|---|
2019 | 10.1145/3351556.3351567 | Proceedings of the 9th Balkan Conference on Informatics |
Keywords | Field | DocType |
HPC, Redundancy, Silent Data Corruption | Data science,Data mining,Silent data corruption,Supercomputer,Computer science | Conference |
ISBN | Citations | PageRank |
978-1-4503-7193-3 | 0 | 0.34 |
References | Authors | |
0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Era Ajdaraga Krluku | 1 | 0 | 0.34 |
Marjan Gusev | 2 | 292 | 68.27 |
Vladimir Zdraveski | 3 | 7 | 4.69 |