Concurrent online sampling for all, for free - Citegraph

Paper Info

Title
Concurrent online sampling for all, for free

Abstract
Database systems rely upon statistical synopses for cardinality estimation. A very versatile and powerful method for estimation purposes is to maintain a random sample of the data. However, drawing a random sample of an existing data set is quite expensive due to the resulting random access pattern, and the sample will get stale over time. It is much more attractive to use online sampling, such that a fresh sample is available at all times, without additional data accesses. While clearly superior from a theoretical perspective, it was not clear how to efficiently integrate online sampling into a database system with high concurrent update and query load. We introduce a novel highly scalable online sampling strategy that allows for sample maintenance with minimal overhead. We can trade off strict freshness guarantees for a significant boost in performance in many-core shared memory scenarios, which is ideal for estimation purposes. We show that by replacing the traditional periodical sample reconstruction in a database system with our online sampling strategy, we get virtually zero overhead in insert performance and completely eliminate the slow random I/O needed for sample construction.

Year	DOI	Venue
2020	10.1145/3399666.3399924	SIGMOD/PODS '20: International Conference on Management of Data Portland Oregon June, 2020
DocType	ISBN	Citations
Conference	978-1-4503-8024-9	0
PageRank	References	Authors
0.34	0	3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Altan Birler	1	0	0.34
Bernhard Radke	2	14	1.86
Thomas Neumann	3	2523	156.50

1