Abstract | ||
---|---|---|
Data profiling is an enabler for efficient data management and effective analytics. The discovery of data dependencies is at the core of data profiling. We conduct the first study on the discovery of embedded uniqueness constraints (eUCs). These constraints represents unique column combinations embedded in complete fragments of incomplete data. We showcase their implementation as filtered indexes, and their application in integrity management and query optimization. We show that the decision variant of discovering a minimal eUC is NP-complete and W[2]-complete. We characterize the maximum possible solution size, and show which families of eUCs attain that size. Despite the challenges, experiments with real-world and synthetic benchmark data show that our column(row)-efficient algorithms perform well with a large number of columns(rows), and our hybrid algorithm combines ideas from both. We show how to rank eUCs to help identify relevant eUCs.
|
Year | DOI | Venue |
---|---|---|
2019 | 10.14778/3358701.3358703 | Proceedings of the VLDB Endowment |
DocType | Volume | Issue |
Journal | 12 | 13 |
ISSN | Citations | PageRank |
2150-8097 | 0 | 0.34 |
References | Authors | |
0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ziheng Wei | 1 | 8 | 6.92 |
Uwe Leck | 2 | 0 | 0.34 |
Sebastian Link | 3 | 462 | 39.59 |