Fault-aware job scheduling for BlueGene/L systems - Citegraph

Paper Info

Title
Fault-aware job scheduling for BlueGene/L systems

Abstract
Summary form only given. Large-scale systems like BlueGene/L are susceptible to a number of software and hardware failures that can affect system performance. We evaluate the effectiveness of a previously developed job scheduling algorithm for BlueGene/L in the presence of faults. We have developed two new job-scheduling algorithms considering failures while scheduling the jobs. We have also evaluated the impact of these algorithms on average bounded slowdown, average response time and system utilization, considering different levels of proactive failure prediction and prevention techniques reported in the literature. Our simulation studies show that the use of these new algorithms with even trivial fault prediction confidence or accuracy levels (as low as 10%) can significantly improve the performance of the BlueGene/L system.

Year	DOI	Keywords
2004	10.1109/IPDPS.2004.1302991	parallel machines,performance evaluation,processor scheduling,system recovery,BlueGene/L systems,average response time,fault-aware job scheduling algorithm,proactive failure prediction,system utilization
Field	DocType	ISBN
Fixed-priority pre-emptive scheduling,Fair-share scheduling,Computer science,Parallel computing,Two-level scheduling,Least slack time scheduling,Rate-monotonic scheduling,Dynamic priority scheduling,Earliest deadline first scheduling,Round-robin scheduling,Distributed computing	Conference	0-7695-2132-0
Citations	PageRank	References
45	3.07	15
Authors
5

Authors (5 rows)

Cited by (45 rows)

References (15 rows)

Name	Order	Citations	PageRank
Adam J. Oliner	1	715	51.10
Ramendra K. Sahoo	2	633	56.73
José E. Moreira	3	2282	230.26
Manish Gupta	4	241	27.47
Anand Sivasubramaniam	5	4485	291.86

1