Title
Anomaly Detection Using Program Control Flow Graph Mining from Execution Logs
Abstract
We focus on the problem of detecting anomalous run-time behavior of distributed applications from their execution logs. Specifically we mine templates and template sequences from logs to form a control flow graph (cfg) spanning distributed components. This cfg represents the baseline healthy system state and is used to flag deviations from the expected behavior of runtime logs. The novelty in our work stems from the new techniques employed to: (1) overcome the instrumentation requirements or application specific assumptions made in prior log mining approaches, (2) improve the accuracy of mined templates and the cfg in the presence of long parameters and high amount of interleaving respectively, and (3) improve by orders of magnitude the scalability of the cfg mining process in terms of volume of log data that can be processed per day. We evaluate our approach using (a) synthetic log traces and (b) multiple real-world log datasets collected at different layers of application stack. Results demonstrate that our template mining, cfg mining, and anomaly detection algorithms have high accuracy. The distributed implementation of our pipeline is highly scalable and has more than 500 GB/day of log data processing capability even on a 10 low-end VM based (Spark + Hadoop) cluster. We also demonstrate the efficacy of our end-to-end system using a case study with the Openstack VM provisioning system.
Year
DOI
Venue
2016
10.1145/2939672.2939712
KDD
Field
DocType
Citations 
Data mining,Anomaly detection,Data processing,Spark (mathematics),Control flow graph,Computer science,Provisioning,Template,Interleaving,Scalability
Conference
16
PageRank 
References 
Authors
0.69
19
5
Name
Order
Citations
PageRank
Animesh Nandi1114076.25
Atri Mandal2173.42
Shubham Atreja3162.38
Gargi Dasgupta433429.23
Subhrajit Bhattacharya546236.93