Title
Online anomaly detection framework for spark systems via stage-task behavior modeling
Abstract
ABSTRACTWith rapid growth of Big Data, Apache Spark has been in widespread use. However, with the system scale growing, application delays caused by abnormal tasks/nodes become a common problem in Spark systems. In this paper, we propose an anomaly detection approach based on stage-task behaviors modeling. First, we assume that the abnormal behavior of tasks can reflect the node's abnormal situation. Then, from the collected Spark runtime logs, we extract the four-dimension feature vector that related to the tasks execution status, and then classify the task behaviors as normal and abnormal, which is used to discover the abnormal nodes from the distribution of abnormal tasks. Simultaneously, we build the online framework on Spark Streaming and it could integrate the offline learning methodologies, such as the logical regression method, which is a very simple and powerful classifier for the low-dimensional eigenvectors. Additionally, our experiments show that the accuracy of realtime anomaly detection reaches about 91%, and the given cases show that our framework is really effective for detecting abnormal nodes.
Year
DOI
Venue
2018
10.1145/3203217.3203265
Computing Frontiers Conference
Keywords
Field
DocType
Spark system, Realtime anomaly detection, Offline logical regression, Feature extraction
Offline learning,Data mining,Anomaly detection,Feature vector,Spark (mathematics),Computer science,Abnormality,Feature extraction,Real-time computing,Classifier (linguistics),Big data
Conference
Citations 
PageRank 
References 
1
0.36
3
Authors
3
Name
Order
Citations
PageRank
Rui Ren1396.66
Shuai Tian210.36
Lei Wang357746.85