Abstract | ||
---|---|---|
Recent research on sampling-based join size estimation has focused on a promising new technique known as correlated sampling. While several variants of this technique have been proposed, there is a lack of a systematic study of this family of techniques. In this paper, we first introduce a framework to characterize its design space in terms of five parameters. Based on this framework, we propose a new correlated sampling based technique to address the limitations of existing techniques. Our new technique is based on using a discrete learning method for estimating the join size from samples. We experimentally compare the performance of multiple variants of our new technique and identify a hybrid variant that provides the best estimation quality. This hybrid variant not only outperforms the state-of-the-art correlated sampling technique, but it is also more robust to small samples and skewed data. |
Year | DOI | Venue |
---|---|---|
2020 | 10.1109/ICDE48307.2020.00035 | 2020 IEEE 36th International Conference on Data Engineering (ICDE) |
Keywords | DocType | ISSN |
query processing,database systems,sampling methods | Conference | 1063-6382 |
ISBN | Citations | PageRank |
978-1-7281-2904-4 | 0 | 0.34 |
References | Authors | |
21 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
TaiNing Wang | 1 | 0 | 0.68 |
Chee Yong Chan | 2 | 643 | 199.24 |