Title
Operation And Maintenance(O&M) For Data Center: An Intelligent Anomaly Detection Approach
Abstract
With the popularity of cloud services and big data in the Internet enterprise, the operation and maintenance (O&M) of data centers have become more critical. The key to O&M is to find anomalies through indicator data. However, the traditional O&M suffers from poor efficiency and low satisfaction, along with a low degree of intelligence. Therefore, introducing a machine learning based anomaly detection method in O&M can effectively improve real-time detection accuracy and make O&M more intelligent. Based on two typical O&M problems prevalent in Internet enterprises at present, stability detection and unattended release of cluster machines, this paper proposes an intelligent anomaly detection approach, called Ensemble learning on Partition Interval (ELPI). The main steps include dividing the data set into the stable interval and the unsteady interval. Then we establish an online/offline algorithm module and perform corresponding integrated learning for different interval characteristics to detect abnormal data. At last, we set up a self-feedback mechanism to dynamically adjust the module threshold. The results show that our method is more accurate and stable than traditional methods. Additionally, our method has been effectively applied to the anomaly detection of big clusters and app release.
Year
DOI
Venue
2021
10.1016/j.comcom.2021.06.030
COMPUTER COMMUNICATIONS
Keywords
DocType
Volume
Intelligent anomaly detection, Operation and maintenance, Machine learning, Big data
Journal
178
ISSN
Citations 
PageRank 
0140-3664
0
0.34
References 
Authors
0
3
Name
Order
Citations
PageRank
Xisheng Xiao100.34
Jin Sun274.49
Jinxin Yang300.34