Title
Effect of Replica Placement on the Reliability of Large-Scale Data Storage Systems
Abstract
Replication is a widely used method to protect large-scale data storage systems from data loss when storage nodes fail. It is well known that the placement of replicas of the different data blocks across the nodes affects the time to rebuild. Several systems described in the literature are designed based on the premise that minimizing the rebuild times maximizes the system reliability. Our results however indicate that the reliability is essentially unaffected by the replica placement scheme. We show that, for a replication factor of two, all possible placement schemes have mean times to data loss (MTTDLs) within a factor of two for practical values of the failure rate, storage capacity, and rebuild bandwidth of a storage node. The theoretical results are confirmed by means of event-driven simulation. For higher replication factors, an analytical derivation of MTTDL becomes intractable for a general placement scheme. We therefore use one of the alternate measures of reliability that have been proposed in the literature, namely, the probability of data loss during rebuild in the critical mode of the system. Whereas for a replication factor of two this measure can be directly translated into MTTDL, it is only speculative of the MTTDL behavior for higher replication factors. This measure of reliability is shown to lie within a factor of two for all possible placement schemes and any replication factor. We also show that for any replication factor, the clustered placement scheme has the lowest probability of data loss during rebuild in critical mode among all possible placement schemes, whereas the declustered placement scheme has the highest probability. Simulation results reveal however that these properties do not hold for the corresponding MTTDLs for a replication factor greater than two. This indicates that some alternate measures of reliability may not be appropriate for comparing the MTTDL of different placement schemes.
Year
DOI
Venue
2010
10.1109/MASCOTS.2010.17
MASCOTS
Keywords
Field
DocType
higher replication factor,general placement scheme,declustered placement scheme,critical mode,storage node,replication factor,data loss,replica placement,large-scale data storage systems,possible placement scheme,alternate measure,different placement scheme,failure rate,data storage,time measurement,reliability,reliability theory,bandwidth,software fault tolerance
Replica,Data loss,Computer data storage,Computer science,Software fault tolerance,Peer to peer computing,Failure rate,Real-time computing,Bandwidth (signal processing),Reliability theory,Distributed computing
Conference
Citations 
PageRank 
References 
9
0.60
10
Authors
5
Name
Order
Citations
PageRank
Vinodh Venkatesan1627.82
Ilias Iliadis218916.16
Xiao Yu Hu3119760.14
Robert Haas426115.05
Christina Fragouli51880173.19