Title
Learning-Based Characterizing And Modeling Performance Bottlenecks Of Big Data Workloads
Abstract
As the increasing demands of large-scale data analytics, the understanding of performance bottlenecks on big data workloads becomes critical for the optimization of distribution platforms. Existing work focused on qualitatively characterizing the behaviors and performance of workloads. However little effort has been spent on quantification of performance bottlenecks and building bottleneck models. In this paper, we define a series of bottleneck ratios to quantify bottlenecks according to resource utilizations. Then based on features parsed from original logs, a stage-level modeling approach is proposed to characterize bottlenecks of workloads. By modeling, we can estimate bottleneck ratios using original logs, without collecting resource utilizations. To generalize the models for diverse workloads, we propose a workload generator: TrainBench, which is flexible to generate workloads with multifarious behaviors at stage-level. In addition, taking hardware performance into account, three key features are extracted to improve the estimation accuracy. Our bottleneck models perform well for diverse workloads in different clusters.
Year
DOI
Venue
2016
10.1109/HPCC-SmartCity-DSS.2016.123
PROCEEDINGS OF 2016 IEEE 18TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS; IEEE 14TH INTERNATIONAL CONFERENCE ON SMART CITY; IEEE 2ND INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS)
Field
DocType
Citations 
Resource management,Bottleneck,Data mining,Data modeling,Data analysis,Computer science,Workload,Parsing,Big data,Benchmark (computing),Distributed computing
Conference
0
PageRank 
References 
Authors
0.34
0
4
Name
Order
Citations
PageRank
Zhongxin Guo100.34
Zheng Hu2506.50
Chunhong Zhang3146.37
Youer Pu4161.61