Abstract | ||
---|---|---|
The success probability of I/O requests in presence of failures is increased by a combination of failover mechanisms built into the storage server, multiple access paths from I/O clients to the server, and timeout-retry mechanisms at the client itself. We define and evaluate a unified availability metric, request failures per million (RFPM), which quantifies request failure probability while taking into account client-side as well as server-side mechanisms. We calculate this metric using a two-level model of I/O service - a probability tree that captures the I/O driver behaviour, and a set of CTMC (Continuous Time Markov Chain) models that capture failover mechanisms at the server. The I/O driver model captures detailed timeout-retry mechanisms including retries at multiple ports (“multipathing”). The server model captures transient phenomena such as failure detection, takeover and emulation behaviour of a paired storage controller. The model shows that client retry mechanisms provide significant improvement in request success probability. The model is then used to study the sensitivity of RFPMs to parameters such as timeouts, reboot time and failure detection delay. The results show that the model can help in answering several what-if questions related to how system parameters impact request success rate. |
Year | DOI | Venue |
---|---|---|
2013 | 10.1109/ISSRE.2013.6698906 | Software Reliability Engineering |
Keywords | Field | DocType |
Markov processes,input-output programs,probability,storage management,trees (mathematics),CTMC models,I/O clients,I/O driver model,continuous time Markov chain models,failover mechanisms,multipathing I/O request,multiple access paths,paired storage controller,probability tree,request failure probability,request failures per million,storage server,timeout-retry mechanisms,unified availability metric,Markov chains,analytical model,availability storage controllers | Reboot,Failover,File server,Markov process,Tree diagram,Continuous-time Markov chain,Computer science,Real-time computing,Input/output,Emulation,Reliability engineering,Distributed computing | Conference |
ISSN | Citations | PageRank |
1071-9458 | 0 | 0.34 |
References | Authors | |
4 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Gangadhar Enagandula | 1 | 0 | 0.34 |
Varsha Apte | 2 | 0 | 0.34 |
Bipul Raj | 3 | 0 | 0.34 |