Entropy-Based Approach to Efficient Cleaning of Big Data in Hierarchical Databases. - Citegraph

Paper Info

Title
Entropy-Based Approach to Efficient Cleaning of Big Data in Hierarchical Databases.

Abstract
When databases are at risk of containing erroneous, redundant, or obsolete data, a cleaning procedure is used to detect, correct or remove such undesirable records. We propose a methodology for improving data cleaning efficiency in a large hierarchical database. The methodology relies on Shannon’s information entropy for measuring the amount of information stored in databases. This approach, which builds on previously-gathered statistical data regarding the prevalence of errors in the database, enables the decision maker to determine which components of the database are likely to have undergone more information loss, and thus to prioritize those components for cleaning. In particular, in cases where the cleaning process is iterative (from the root node down), the entropic approach produces a scientifically motivated stopping rule that determines the optimal (i.e. minimally required) number of tiers in the hierarchical database that need to be examined. This stopping rule defines a more streamlined representation of the database, in which less informative tiers are eliminated.

Year	DOI	Venue
2020	10.1007/978-3-030-59612-5_1	BigData Congress
DocType	Citations	PageRank
Conference	0	0.34
References	Authors
0	7

Authors (7 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Eugene Levner	1	466	48.53
Boris Kriheli	2	8	3.28
Arriel Benis	3	0	0.34
Alexander Ptuskin	4	0	0.34
Amir Elalouf	5	22	5.99
Sharon Hovav	6	0	0.34
Shai Ashkenazi	7	0	0.34

1