Title
Selectivity Functions of Range Queries are Learnable
Abstract
This paper explores the use of machine learning for estimating the selectivity of range queries in database systems. Using classic learning theory for real-valued functions based on shattering dimension, we show that the selectivity function of a range space with bounded VC-dimension is learnable. As many popular classes of queries (e.g., orthogonal range search, inequalities involving linear combination of attributes, distance-based search, etc.) represent range spaces with finite VC-dimension, our result immediately implies that their selectivity functions are also learnable. To the best of our knowledge, this is the first attempt at formally explaining the role of machine learning techniques in selectivity estimation, and complements the growing literature in empirical studies in this direction. Supplementing these theoretical results, our experimental results demonstrate that, empirically, even a basic learning algorithm with generic models is able to produce accurate predictions across settings, matching state-of-art methods designed for specific queries, and using training sample sizes commensurate with our theory.
Year
DOI
Venue
2022
10.1145/3514221.3517896
PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22)
Keywords
DocType
ISSN
learning theory, selectivity estimation, range space, fat-shattering dimension, Vapnik-Chervonenkis dimension
Conference
0730-8078
Citations 
PageRank 
References 
0
0.34
0
Authors
7
Name
Order
Citations
PageRank
Xiaohua Hu12819314.15
Yuxi Liu200.34
Haibo Xiu300.34
Pankaj K. Agarwal45257593.81
Debmalya Panigrahi500.34
Sudeepa Roy626830.95
Jun Yang72762241.66