Title
A Hybrid Data Cleaning Framework Using Markov Logic Networks (Extended Abstract)
Abstract
With the growth of dirty data, data cleaning turns into a crux of data analysis. In this paper, we propose a novel hybrid data cleaning framework, termed as MLNClean, which is capable of learning instantiated rules to supplement the insufficient integrity constraints. MLNClean consists of two steps, i.e., pre processing and two stage data cleaning. In the pre-processing step, MLNClean first infers a set of probable instantiated rules according to Markov logic network (MLN) and then builds a two-layer MLN index to generate multiple data versions and facilitate the cleaning process. In the two-stage data cleaning step, MLNClean first presents a concept of reliability score to clean errors within each data version separately, and then, it eliminates the conflict values among different data versions using a novel concept of fusion score. Considerable experimental results on both real and synthetic scenarios demonstrate the effectiveness of MLNClean.
Year
DOI
Venue
2021
10.1109/ICDE51399.2021.00258
2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021)
DocType
ISSN
Citations 
Conference
1084-4627
0
PageRank 
References 
Authors
0.34
0
5
Name
Order
Citations
PageRank
Congcong Ge101.69
Yunjun Gao286289.71
Xiaoye Miao3114.59
Bin Yao436532.71
Haobo Wang593.81