Title
Concurrent online sampling for all, for free
Abstract
Database systems rely upon statistical synopses for cardinality estimation. A very versatile and powerful method for estimation purposes is to maintain a random sample of the data. However, drawing a random sample of an existing data set is quite expensive due to the resulting random access pattern, and the sample will get stale over time. It is much more attractive to use online sampling, such that a fresh sample is available at all times, without additional data accesses. While clearly superior from a theoretical perspective, it was not clear how to efficiently integrate online sampling into a database system with high concurrent update and query load. We introduce a novel highly scalable online sampling strategy that allows for sample maintenance with minimal overhead. We can trade off strict freshness guarantees for a significant boost in performance in many-core shared memory scenarios, which is ideal for estimation purposes. We show that by replacing the traditional periodical sample reconstruction in a database system with our online sampling strategy, we get virtually zero overhead in insert performance and completely eliminate the slow random I/O needed for sample construction.
Year
DOI
Venue
2020
10.1145/3399666.3399924
SIGMOD/PODS '20: International Conference on Management of Data Portland Oregon June, 2020
DocType
ISBN
Citations 
Conference
978-1-4503-8024-9
0
PageRank 
References 
Authors
0.34
0
3
Name
Order
Citations
PageRank
Altan Birler100.34
Bernhard Radke2141.86
Thomas Neumann32523156.50