Title
HARDFS: hardening HDFS with selective and lightweight versioning
Abstract
We harden the Hadoop Distributed File System (HDFS) against fail-silent (non fail-stop) behaviors that result from memory corruption and software bugs using a new approach: selective and lightweight versioning (SLEEVE). With this approach, actions performed by important subsystems of HDFS (e.g., namespace management) are checked by a second implementation of the subsystem that uses lightweight, approximate data structures. We show that HARDFS detects and recovers from a wide range of fail-silent behaviors caused by random bit flips, targeted corruptions, and real software bugs. In particular, HARDFS handles 90% of the fail-silent faults that result from random memory corruption and correctly detects and recovers from 100% of 78 targeted corruptions and 5 real-world bugs. Moreover, it recovers orders of magnitude faster than full reboot by using micro-recovery. The extra protection in HARDFS incurs minimal performance and space overheads.
Year
Venue
Keywords
2013
FAST
hardfs detects,hardening hdfs,fail-silent behavior,fail-silent fault,random bit,random memory corruption,new approach,memory corruption,targeted corruption,real software bug,lightweight versioning
Field
DocType
Citations 
Distributed File System,Reboot,Data structure,Memory corruption,Computer science,Parallel computing,Software bug,Real-time computing,Namespace,Operating system,Software versioning
Conference
9
PageRank 
References 
Authors
0.49
41
6
Name
Order
Citations
PageRank
Thanh Do11567.11
Tyler Harter222512.32
Yingchao Liu3110.87
Haryadi S. Gunawi455436.58
Andrea C. Arpaci-Dusseau53133307.84
Remzi H. Arpaci-Dusseau63120383.86