Abstract | ||
---|---|---|
Most space-sharing parallel computers presently op- erated by high-performance computing centers use batch-queuing systems to manage processor alloca- tion. Because these machines are typically "space- shared," each job must wait in a queue until su- cient processor resources become available to service it. In production computing settings, the queuing de- lay (experienced by users as the time between when the job is submitted and when it begins execution) is highly variable. Users often find this variability a drag on productivity as it makes planning dicult and intellectual continuity hard to maintain. In this work, we introduce an on-line system for predicting batch-queue delay and show that it gen- erates correct and accurate bounds for queuing delay for batch jobs from 11 machines over a 9-year period. Our system comprises 4 novel and interacting com- ponents: a predictor based on nonparametric infer- ence; an automated change-point detector; machine- learned, model-based clustering of jobs having similar characteristics; and an automatic downtime detector to identify systemic failures that aect job queuing delay. We compare the correctness and accuracy of our system against various previously used prediction techniques and show that our new method outper- forms them for all machines we have available for study. |
Year | DOI | Venue |
---|---|---|
2007 | 10.1145/1254882.1254939 | SIGMETRICS |
Keywords | Field | DocType |
super-computing,time series,batch job,automatic downtime detector,queue bounds estimation,queue prediction,batch-queue delay,sufficient processor resource,batch scheduling,production computing setting,9-year period,high-performance computing center,on-line system,automated change-point detector,processor allocation | Bulk queue,M/M/c queue,Multilevel feedback queue,Computer science,Queuing delay,Queue,M/G/1 queue,Real-time computing,Priority queue,Job scheduler,Distributed computing | Conference |
Volume | Issue | ISSN |
35 | 1 | 0163-5999 |
ISBN | Citations | PageRank |
3-540-78698-8 | 52 | 2.51 |
References | Authors | |
16 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Dan Nurmi | 1 | 101 | 5.66 |
John Brevik | 2 | 679 | 42.60 |
Rich Wolski | 3 | 4126 | 429.97 |