Title
Reliability Analysis for Software Cluster Systems Based on Proportional Hazard Model
Abstract
With the universal application of software cluster systems, their reliability is drawing more and more attention from academia to industry. A cluster system is a kind of software load-sharing system (LSS) whose reliability is significantly dependent on system software. Therefore, traditional reliability analysis methods for hardware LSSs are not applicable for cluster systems. In this paper, we develop a reliability analysis model for redundant cluster systems consisting of initial servers and cold standby servers used to replace failed ones. System reliability process is modeled with a state-based non-homogeneous Markov process (NHMH), where each state corresponds to a non-homogeneous Poisson processe (NHPP). NHPP arrival rate is expressed using Cox's proportional hazard model (PHM) in terms of cumulative and instantaneous workload of system software. In addition to redundant cluster systems without repair, the model also can be extended to analyze those with restart. The analysis results are meaningful to support cluster management and design decisions. Finally, the evaluation experiments show the potential of our model.
Year
DOI
Venue
2016
10.1109/COMPSAC.2016.177
2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC)
Keywords
Field
DocType
cluster system,load-sharing system,cumulative workload,software reliability,software aging
System software,Markov process,Computer science,Server,Real-time computing,Software,Software reliability testing,Software aging,Software quality,Software sizing,Reliability engineering
Conference
Volume
ISSN
ISBN
1
0730-3157
978-1-4673-8846-7
Citations 
PageRank 
References 
0
0.34
16
Authors
4
Name
Order
Citations
PageRank
Chunyan Hou100.68
Chen Chen244057.36
Jinsong Wang383.15
Kai Shi484.99