Title
Cost Reduction in High Power Computing Using a Deferred Repair Strategy: A Simulation Study
Abstract
Fault-tolerant systems with repair-upon-failure strategy can become expensive in terms of labour and time. Especially for homogeneous multi-server systems, if no control hierarchy exists, postponing non essential repairs can reduce these costs without affecting the availability of the whole system significantly. Of course, while postponing these repairs, it is essential to keep the whole system capable to deal with user requests. For this purpose, usually, a threshold value is defined which represents the minimum number of servers the system administrator should keep operative. Performability evaluation of such systems is very important since the systems are fault tolerant. In this paper, the simulation of large scale multi-server systems, with identical servers, serving a stream of arriving jobs is considered. The cost of running such systems with various deferred repair strategies has been calculated and compared to the cost of using a repair-upon failure strategy.
Year
DOI
Venue
2008
10.1109/UKSIM.2008.122
UKSim
Keywords
Field
DocType
control systems,availability,degradation,fault tolerant,fault tolerant system,monte carlo methods,cost function,computer simulation,computational modeling
Computer science,Homogeneous,Server,Fault tolerance,System administrator,Control system,Hierarchy,Cost reduction,Reliability engineering
Conference
ISSN
Citations 
PageRank 
2381-4772
1
0.36
References 
Authors
8
3
Name
Order
Citations
PageRank
Altan Koçyigit1228.09
Orhan Gemikonakli216330.24
Enver Ever314220.65