Title
Optimal load shedding with aggregates and mining queries
Abstract
To cope with bursty arrivals of high-volume data, a DSMS has to shed load while minimizing the degradation of Quality of Service (QoS). In this paper, we show that this problem can be formalized as a classical optimization task from operations research, in ways that accommodate different requirements for multiple users, different query sensitivities to load shedding, and different penalty functions. Standard non-linear programming algorithms are adequate for non-critical situations, but for severe overloads, we propose a more efficient algorithm that runs in linear time, without compromising optimality. Our approach is applicable to a large class of queries including traditional SQL aggregates, statistical aggregates (e. g., quantiles), and data mining functions, such as k-means, naive Bayesian classifiers, decision trees, and frequent pattern discovery (where we can even specify a different error bound for each pattern). In fact, we show that these aggregate queries are special instances of a broader class of functions, that we call reciprocal-error aggregates, for which the proposed methods apply with full generality.Finally, we propose a novel architecture for supporting load shedding in an extensible system, where users can write arbitrary User Defined Aggregates (UDA), and thus confirm our analytical findings with several experiments executed on an actual DSMS.
Year
DOI
Venue
2010
10.1109/ICDE.2010.5447867
ICDE
Keywords
Field
DocType
Bayes methods,data mining,load shedding,nonlinear programming,query processing,DSMS,SQL aggregates,aggregate queries,data mining functions,decision trees,extensible system,frequent pattern discovery,high volume data arrivals,k-means,load shedding,mining queries,naive Bayesian classifiers,nonlinear programming algorithms,optimization,penalty functions,quality of service degradation,reciprocal error aggregates,statistical aggregates,user defined aggregates
SQL,Data mining,Decision tree,k-means clustering,Naive Bayes classifier,Computer science,Nonlinear programming,Quality of service,Theoretical computer science,Linear programming,Time complexity,Database
Conference
ISSN
Citations 
PageRank 
1084-4627
18
0.76
References 
Authors
18
2
Name
Order
Citations
PageRank
Barzan Mozafari181938.21
Carlo Zaniolo243051447.58