Abstract | ||
---|---|---|
Supercomputing systems must be able to reliably and efficiently complete their assigned workloads, even in the presence of failures. This paper proposes a system that allows the system and users to negotiate a mutually desirable risk strategy; in order to accomplish this, the system makes probabilistic guarantees on quality of service (QoS), of the form, "Job j can be completed by deadline d with probability p." In order to make such guarantees, the system uses event prediction (forecasting) in conjunction with fault-aware job scheduling and cooperative checkpointing strategies. Using job logs and failure traces from actual high performance computing systems, we employ trace-based simulations to assess the effects of the prediction accuracy (a) and user risk strategy (U) on a variety of performance metrics. Compared to a system that does not use event prediction, a high forecasting accuracy resulted in QoS and utilization improvements of as much as 6%, along with an 89% reduction in the amount of lost work. Therefore, our results show that a system that makes probabilistic QoS guarantees using a market-based scheduling approach can increase both system performance and reliability. |
Year | DOI | Venue |
---|---|---|
2005 | 10.1109/DSN.2005.80 | DSN |
Keywords | Field | DocType |
checkpointing,fault tolerant computing,probability,quality of service,scheduling,cooperative checkpointing strategies,event prediction,fault-aware job scheduling,market-based scheduling approach,probabilistic QoS,quality of service,risk strategy,supercomputing systems,system performance,system reliability,trace-based simulation | Fair-share scheduling,Supercomputer,Computer science,Scheduling (computing),Quality of service,Real-time computing,Job scheduler,Probabilistic logic,Reliability engineering,Distributed computing | Conference |
ISSN | ISBN | Citations |
1530-0889 | 0-7695-2282-3 | 8 |
PageRank | References | Authors |
0.93 | 14 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Adam J. Oliner | 1 | 715 | 51.10 |
larry rudolph | 2 | 10 | 1.96 |
Ramendra K. Sahoo | 3 | 633 | 56.73 |
José E. Moreira | 4 | 2282 | 230.26 |
madhusudan gupta | 5 | 8 | 0.93 |