Title
HybridTune: Spatio-Temporal Performance Data Correlation for Performance Diagnosis of Big Data Systems.
Abstract
With tremendous growing interests in Big Data, the performance improvement of Big Data systems becomes more and more important. Among many steps, the first one is to analyze and diagnose performance bottlenecks of the Big Data systems. Currently, there are two major solutions. One is the pure data-driven diagnosis approach, which may be very time-consuming; the other is the rule-based analysis method, which usually requires prior knowledge. For Big Data applications like Spark workloads, we observe that the tasks in the same stages normally execute the same or similar codes on each data partition. On basis of the stage similarity and distributed characteristics of Big Data systems, we analyze the behaviors of the Big Data applications in terms of both system and micro-architectural metrics of each stage. Furthermore, for different performance problems, we propose a hybrid approach that combines prior rules and machine learning algorithms to detect performance anomalies, such as straggler tasks, task assignment imbalance, data skew, abnormal nodes and outlier metrics. Following this methodology, we design and implement a lightweight, extensible tool, named HybridTune, and measure the overhead and anomaly detection effectiveness of HybridTune using the BigDataBench benchmarks. Our experiments show that the overhead of HybridTune is only 5%, and the accuracy of outlier detection algorithm reaches up to 93%. Finally, we report several use cases diagnosing Spark and Hadoop workloads using BigDataBench, which demonstrates the potential use of HybridTune.
Year
DOI
Venue
2019
10.1007/s11390-019-1968-y
Journal of Computer Science and Technology
Keywords
Field
DocType
Big Data system, spatio-temporal correlation, rule-based diagnosis, machine learning
Data mining,Anomaly detection,Use case,Spark (mathematics),Computer science,Outlier,Skew,Big data,Performance improvement,Theory of computation,Distributed computing
Journal
Volume
Issue
ISSN
34
6
1000-9000
Citations 
PageRank 
References 
0
0.34
0
Authors
7
Name
Order
Citations
PageRank
Rui Ren1396.66
Jiechao Cheng200.34
Xiwen He331.59
Lei Wang457746.85
Jianfeng Zhan576762.86
Wanling Gao629919.12
Chunjie Luo743421.86