Title
Discovery and ranking of embedded uniqueness constraints
Abstract
Data profiling is an enabler for efficient data management and effective analytics. The discovery of data dependencies is at the core of data profiling. We conduct the first study on the discovery of embedded uniqueness constraints (eUCs). These constraints represents unique column combinations embedded in complete fragments of incomplete data. We showcase their implementation as filtered indexes, and their application in integrity management and query optimization. We show that the decision variant of discovering a minimal eUC is NP-complete and W[2]-complete. We characterize the maximum possible solution size, and show which families of eUCs attain that size. Despite the challenges, experiments with real-world and synthetic benchmark data show that our column(row)-efficient algorithms perform well with a large number of columns(rows), and our hybrid algorithm combines ideas from both. We show how to rank eUCs to help identify relevant eUCs.
Year
DOI
Venue
2019
10.14778/3358701.3358703
Proceedings of the VLDB Endowment
DocType
Volume
Issue
Journal
12
13
ISSN
Citations 
PageRank 
2150-8097
0
0.34
References 
Authors
0
3
Name
Order
Citations
PageRank
Ziheng Wei186.92
Uwe Leck200.34
Sebastian Link346239.59