Title
Practical Resource Usage Prediction Method for Large Memory Jobs in HPC Clusters.
Abstract
Users in high performance computing (HPC) clusters normally face challenges to specify accurate resource estimates for running their applications as batch jobs. Prediction is a common way to alleviate this complexity by using historical job records of previous runs to estimate resource usage for new coming jobs. Most of existing resource prediction methods directly build a single model to consider all of the jobs in clusters. However, people in production usage tend to only focus on the resource usage of jobs with certain patterns, e.g. jobs with large memory consumption. This paper proposes a practical resource prediction method for large memory jobs. The proposed method first tries to predict whether a job tends to use large memory size, and then predicts the final memory usage using a model which is trained by only historical large memory jobs. Using several real-world job traces collected from large production clusters of IBM Spectrum LSF customer sites, the evaluation results show that the average prediction errors can be reduced up to 40% for nearly 90% of large memory jobs. Meanwhile, the model training cost can be reduced over 30% for the evaluated job traces.
Year
DOI
Venue
2019
10.1007/978-3-030-18645-6_1
Lecture Notes in Computer Science
Keywords
DocType
Volume
Resource usage prediction,Large memory jobs,Resource manager
Conference
11416
ISSN
Citations 
PageRank 
0302-9743
1
0.34
References 
Authors
0
4
Name
Order
Citations
PageRank
Xiuqiao Li1515.74
Nan Qi210.34
Yuanyuan He3216.11
Bill McMillan410.34