Title
A clustering-based sampling method for building query response time models.
Abstract
Predicting query response time is a fundamental issue for many database system management tasks, such as query scheduling, query progress visualization, system sizing, and load balancing. Query interaction, an interesting phenomenon that query response time might be accelerated or deaccelerated by concurrent queries, has to be taken into account when building models for predicting query response time. Since query interactions change over time and are hard to describe with analytical models, therefore, statistical models are proposed to achieve better performance by describing query interactions in terms of statistics of query mixes, consisting of a set of concurrently running queries. The high multi-programming level (MPL) of modern data centers means an explosive space of query mixes, which results in a high cost for training statistical models, especially in the pay-as-you-go cloud computing settings. To address this issue, we propose a clustering-based sampling method to reduce sampling cost while maintaining the accuracy of statistical models. High quality samples are selected to cover all the clusters with representativeness in terms of query interactions. Query rating is introduced as the feature vector of queries, and transformed to the feature vector of query mixes for clustering purpose. Experimental evaluation with TPC-H queries shows that the proposed method can reduce 33% sampling cost while maintaining the accuracy of the statistical models.
Year
Venue
Keywords
2017
COMPUTER SYSTEMS SCIENCE AND ENGINEERING
Clustering-based Sampling,Statistical Modeling,Performance Prediction,Elastic Resource Management
Field
DocType
Volume
Data mining,Computer science,Response time,Sampling (statistics),Cluster analysis,Distributed computing
Journal
32
Issue
ISSN
Citations 
4
0267-6192
0
PageRank 
References 
Authors
0.34
0
2
Name
Order
Citations
PageRank
Jinwen Zhang100.68
Baoning Niu2537.37