Title
The analytical bootstrap: a new method for fast error estimation in approximate query processing
Abstract
Sampling is one of the most commonly used techniques in Approximate Query Processing (AQP)-an area of research that is now made more critical by the need for timely and cost-effective analytics over \"Big Data\". Assessing the quality (i.e., estimating the error) of approximate answers is essential for meaningful AQP, and the two main approaches used in the past to address this problem are based on either (i) analytic error quantification or (ii) the bootstrap method. The first approach is extremely efficient but lacks generality, whereas the second is quite general but suffers from its high computational overhead. In this paper, we introduce a probabilistic relational model for the bootstrap process, along with rigorous semantics and a unified error model, which bridges the gap between these two traditional approaches. Based on our probabilistic framework, we develop efficient algorithms to predict the distribution of the approximation results. These enable the computation of any bootstrap-based quality measure for a large class of SQL queries via a single-round evaluation of a slightly modified query. Extensive experiments on both synthetic and real-world datasets show that our method has superior prediction accuracy for bootstrap-based quality measures, and is several orders of magnitude faster than bootstrap.
Year
DOI
Venue
2014
10.1145/2588555.2588579
SIGMOD Conference
Keywords
Field
DocType
approximate query processing,bootstrap,error estimation,query processing
SQL,Data mining,Overhead (computing),Computer science,Bootstrap aggregating,Sampling (statistics),Analytics,Big data,Bootstrapping (electronics),Database,Computation
Conference
Citations 
PageRank 
References 
44
1.14
31
Authors
4
Name
Order
Citations
PageRank
Kai Zeng137323.16
Shi Gao21127.72
Barzan Mozafari381938.21
Carlo Zaniolo443051447.58