Title
PatchIndex - Exploiting Approximate Constraints in Self-managing Databases
Abstract
In the cloud environment, data warehouse solutions need to be self-managing in order to be usable without prior database administration knowledge. Additionally, data is typically not clean in these environments, as it is imported from various sources. As a consequence, automatic schema optimization as an important task of self-management becomes difficult without human interaction and data cleaning steps. Within this paper, we focus on constraint discovery as a subtask of schema optimization. Real world datasets with unclean data may not contain perfect constraints, as a minor part of the values hampers the definition of them. Therefore, we introduce the PatchIndex structure, which handles these exceptions to column constraints and enables self-management tools to discover and define approximate constraints on unclean data. We present “nearly unique column” and nearly sorted column” constraints, both managed by the generic PatchIndex structure. Furthermore, we provide mechanisms to discover these constraints and show how query performance can benefit from them for different use cases by integrating them into query optimization. Our evaluation shows that the PatchIndex structure offers opportunities for a significant performance boost in different use cases while enabling self-management tools to define constraints on unclean data.
Year
DOI
Venue
2020
10.1109/ICDEW49219.2020.00014
2020 IEEE 36th International Conference on Data Engineering Workshops (ICDEW)
Keywords
DocType
ISSN
self-managing databases,schema refinement,approximate constraints,uniqueness,patch processing
Conference
1943-2895
ISBN
Citations 
PageRank 
978-1-7281-4267-8
0
0.34
References 
Authors
7
3
Name
Order
Citations
PageRank
Steffen Kläbe100.34
Kai-uwe Sattler21144126.81
Stephan Baumann300.34