Abstract | ||
---|---|---|
We harden the Hadoop Distributed File System (HDFS) against fail-silent (non fail-stop) behaviors that result from memory corruption and software bugs using a new approach: selective and lightweight versioning (SLEEVE). With this approach, actions performed by important subsystems of HDFS (e.g., namespace management) are checked by a second implementation of the subsystem that uses lightweight, approximate data structures. We show that HARDFS detects and recovers from a wide range of fail-silent behaviors caused by random bit flips, targeted corruptions, and real software bugs. In particular, HARDFS handles 90% of the fail-silent faults that result from random memory corruption and correctly detects and recovers from 100% of 78 targeted corruptions and 5 real-world bugs. Moreover, it recovers orders of magnitude faster than full reboot by using micro-recovery. The extra protection in HARDFS incurs minimal performance and space overheads. |
Year | Venue | Keywords |
---|---|---|
2013 | FAST | hardfs detects,hardening hdfs,fail-silent behavior,fail-silent fault,random bit,random memory corruption,new approach,memory corruption,targeted corruption,real software bug,lightweight versioning |
Field | DocType | Citations |
Distributed File System,Reboot,Data structure,Memory corruption,Computer science,Parallel computing,Software bug,Real-time computing,Namespace,Operating system,Software versioning | Conference | 9 |
PageRank | References | Authors |
0.49 | 41 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Thanh Do | 1 | 156 | 7.11 |
Tyler Harter | 2 | 225 | 12.32 |
Yingchao Liu | 3 | 11 | 0.87 |
Haryadi S. Gunawi | 4 | 554 | 36.58 |
Andrea C. Arpaci-Dusseau | 5 | 3133 | 307.84 |
Remzi H. Arpaci-Dusseau | 6 | 3120 | 383.86 |