Title
Reliability Mechanisms for Very Large Storage Systems
Abstract
Reliability and availability are increasingly important in large-scale storage systems built from thousands of individual storage devices. Large systems must survive the failure of individual components; in systems with thousands of disks, even infrequent failures are likely in some device. We focus on two types of errors: nonrecoverable read errors and drive failures. We discuss mechanisms for detecting and recovering from such errors, introducing improvedtechniques for detecting errors in disk reads and fast recovery from disk failure. We show that simple RAID cannot guarantee sufficient reliability; our analysis examines the tradeoffs among other schemes between system availability and storage efficiency. Based on our data, we believe that two-way mirroring should be sufficient for most large storage systems. For those that need very high reliability, we recommend either three-way mirroring or mirroring combined with RAID.
Year
DOI
Venue
2003
10.1109/MASS.2003.1194851
IEEE Symposium on Mass Storage Systems
Keywords
Field
DocType
large-scale storage system,individual storage device,sufficient reliability,high reliability,large storage systems,storage efficiency,reliability mechanisms,two-way mirroring,drive failure,three-way mirroring,disk failure,large storage system,storage system,redundancy,bandwidth,frequency,availability,raid,high performance computing,file servers
File server,Non-standard RAID levels,Computer science,Standard RAID levels,Disk mirroring,RAID,Disk Data Format,Disk array controller,Parity drive,Distributed computing
Conference
ISSN
ISBN
Citations 
2160-195X
0-7695-1914-8
99
PageRank 
References 
Authors
5.87
12
6
Name
Order
Citations
PageRank
Qin Xin123715.41
Ethan L. Miller22870281.96
Thomas Schwarz320919.12
Darrell D. E. Long43111536.40
Scott A. Brandt5166394.81
Witold Litwin61937928.21