Title
A Novel Classification Model to Predict Batch Job Failures in Co-located Cloud
Abstract
Nowadays, cloud co-location is often used for data centers to improve the utilization of computing resources. However, batch jobs in a Co-location Datacenter (CLD) are vulnerable to failures due to the competition for limited resources with online service jobs. Such failed batch jobs would be rescheduled and failed repeatedly, resulting in the waste of computing resources and instability of the computing clusters. Therefore, we propose a method to accurately predict the potential failures of batch jobs for CLD. The core of the proposed method is STLF (SMOTE Tomek and LightGBM [5] Framework), which is divided into three parts. First, we use the co-feature extraction method to generate Co-located Feature Dataset (CLFD). Then SMOTE Tomek is used to oversampling the CLFD to ensure that the classifier can learn more minority features. Finally, we use LightGBM classifier to predict batch jobs' failure. The performance experiments conducted on the Ali Trace 2018 dataset show that our proposed STLF significantly outperforms the existing popular classifiers in terms of the ROC curve, the area under the ROC curve (AUC), precision, and recall.
Year
DOI
Venue
2020
10.1109/ICPADS51040.2020.00080
2020 IEEE 26th International Conference on Parallel and Distributed Systems (ICPADS)
Keywords
DocType
ISSN
cloud computing,co-located datacenter,failure prediction,resource efficiency,datacenter
Conference
1521-9097
ISBN
Citations 
PageRank 
978-1-7281-8382-4
0
0.34
References 
Authors
0
6
Name
Order
Citations
PageRank
Yurui Li100.34
Weiwei Lin281.85
Keqin Li32778242.13
James Z. Wang400.34
Fagui Liu5236.06
Jie Liu619922.56