Title
Probabilistic QoS Guarantees for Supercomputing Systems
Abstract
Supercomputing systems must be able to reliably and efficiently complete their assigned workloads, even in the presence of failures. This paper proposes a system that allows the system and users to negotiate a mutually desirable risk strategy; in order to accomplish this, the system makes probabilistic guarantees on quality of service (QoS), of the form, "Job j can be completed by deadline d with probability p." In order to make such guarantees, the system uses event prediction (forecasting) in conjunction with fault-aware job scheduling and cooperative checkpointing strategies. Using job logs and failure traces from actual high performance computing systems, we employ trace-based simulations to assess the effects of the prediction accuracy (a) and user risk strategy (U) on a variety of performance metrics. Compared to a system that does not use event prediction, a high forecasting accuracy resulted in QoS and utilization improvements of as much as 6%, along with an 89% reduction in the amount of lost work. Therefore, our results show that a system that makes probabilistic QoS guarantees using a market-based scheduling approach can increase both system performance and reliability.
Year
DOI
Venue
2005
10.1109/DSN.2005.80
DSN
Keywords
Field
DocType
checkpointing,fault tolerant computing,probability,quality of service,scheduling,cooperative checkpointing strategies,event prediction,fault-aware job scheduling,market-based scheduling approach,probabilistic QoS,quality of service,risk strategy,supercomputing systems,system performance,system reliability,trace-based simulation
Fair-share scheduling,Supercomputer,Computer science,Scheduling (computing),Quality of service,Real-time computing,Job scheduler,Probabilistic logic,Reliability engineering,Distributed computing
Conference
ISSN
ISBN
Citations 
1530-0889
0-7695-2282-3
8
PageRank 
References 
Authors
0.93
14
5
Name
Order
Citations
PageRank
Adam J. Oliner171551.10
larry rudolph2101.96
Ramendra K. Sahoo363356.73
José E. Moreira42282230.26
madhusudan gupta580.93