Title
Very fast estimation for result and accuracy of big data analytics: The EARL system
Abstract
Approximate results based on samples often provide the only way in which advanced analytical applications on very massive data sets (a.k.a. ‘big data’) can satisfy their time and resource constraints. Unfortunately, methods and tools for the computation of accurate early results are currently not supported in big data systems (e.g., Hadoop). Therefore, we propose a nonparametric accuracy estimation method and system to speedup big data analytics. Our framework is called EARL (Early Accurate Result Library) and it works by predicting the learning curve and choosing the appropriate sample size for achieving the desired error bound specified by the user. The error estimates are based on a technique called bootstrapping that has been widely used and validated by statisticians, and can be applied to arbitrary functions and data distributions. Therefore, this demo will elucidate (a) the functionality of EARL and its intuitive GUI interface whereby first-time users can appreciate the accuracy obtainable from increasing sample sizes by simply viewing the learning curve displayed by EARL, (b) the usability of EARL, whereby conference participants can interact with the system to quickly estimate the sample sizes needed to obtain the desired accuracies or response times, and then compare them against the accuracies and response times obtained in the actual computations.
Year
DOI
Venue
2013
10.1109/ICDE.2013.6544928
ICDE
Keywords
DocType
Citations 
EARL system,sample size,response time,nonparametric accuracy estimation method,error estimate,big data system,speedup big data analytics,appropriate sample size,data distribution,massive data set,fast estimation,big data
Conference
0
PageRank 
References 
Authors
0.34
0
3
Name
Order
Citations
PageRank
Carlo Zaniolo143051447.58
Kai Zeng219512.99
Nikolay Laptev316311.07