Title
Bit-Oriented Sampling for Aggregation on Big Data
Abstract
AbstractThe efficiency of big data analysis has become a bottleneck. Aggregation is a fundamental analytical task. It usually consumes a lot of time so that sampling based aggregation is often used to improve response time at a loss of result accuracy. In all of the related works, sampling is conducted at the granularity of data item. Considering the bits at different bit positions of each data item have different contributions to an aggregation result, the performance of sampling based aggregation has a chance of being improved if sampling is conducted at the granularity of bit. Thus, this paper studies bit-oriented sampling for aggregation. Two methods of bit-oriented uniform sampling based aggregation, i.e., DVBM and DVFM, are proposed which are based on the central limit theorem or the Chebyshev's inequality. They are much more efficient than the methods of the traditional data-oriented uniform sampling based aggregation. DVBM can guarantee a given error bound of aggregation with the assumption that sample variance equals dataset variance. By contrast, DVFM achieves the same goal without that assumption, but it could result in a larger sampling size. Extensive experiments are carried out and the results show that DVBM and DVFM are both efficient and effective.
Year
DOI
Venue
2021
10.1109/TKDE.2019.2931014
Periodicals
Keywords
DocType
Volume
Big data analysis, sampling based aggregation, approximate query processing, bit-oriented sampling
Journal
33
Issue
ISSN
Citations 
2
1041-4347
0
PageRank 
References 
Authors
0.34
0
2
Name
Order
Citations
PageRank
Huan Hu100.34
Jianzhong Li26324.23