Title
Algorithms for the discovery of embedded functional dependencies
Abstract
Embedded functional dependencies (eFDs) advance data management applications by data completeness and integrity requirements. We show that the discovery problem of eFDs is $${\mathsf {NP}}$$ -complete, $$\mathsf {W}[2]$$ -complete in the output, and has a minimum solution space that is larger than the maximum solution space for functional dependencies. Nevertheless, we use novel data structures and search strategies to develop row-efficient, column-efficient, and hybrid algorithms for eFD discovery. Our experiments demonstrate that the algorithms scale well in terms of their design targets, and that ranking the eFDs by the number of redundant data values they cause can provide useful guidance in identifying meaningful eFDs for applications. We further demonstrate the benefits of introducing completeness requirements and ranking by the number of redundant data values for other variants of functional dependencies. Finally, we show how to compute informative Armstrong samples and illustrate the performance of our algorithms on the benchmark data. The informative Armstrong samples can be used to find eFDs that are meaningful for the application domain but violated by a given data set due to inconsistencies.
Year
DOI
Venue
2021
10.1007/s00778-021-00684-3
The VLDB Journal
Keywords
DocType
Volume
Algorithm, Armstrong sample, Completeness requirement, Data redundancy, Discovery, Integrity requirement, Intractability, Missing data, Functional Dependency
Journal
30
Issue
ISSN
Citations 
6
1066-8888
0
PageRank 
References 
Authors
0.34
0
3
Name
Order
Citations
PageRank
Ziheng Wei186.92
Sven Hartmann240942.86
Sebastian Link346239.59