Title
Swapping Repair for Misplaced Attribute Values
Abstract
Misplaced data in a tuple are prevalent, e.g., a value "Passport" is misplaced in the passenger-name attribute, which should belong to the travel-document attribute instead. While repairing in-attribute errors have been widely studied, i.e., to repair the error by other values in the attribute domain, misplacement errors are surprisingly untouched, where the true value is simply misplaced in some other attribute of the same tuple. For instance, the true passenger-name is indeed misplaced in the travel-document attribute of the record. In this sense, we need a novel swapping repair model (to swap the misplaced passenger-name and travel-document values "Passport" and "John Adam" in the same tuple). Determining a proper swapping repair, however, is non-trivial. The minimum change criterion, evaluating the distance between the swapping repaired values, is obviously meaningless, since they are from different attribute domains. Intuitively, one may examine whether the swapped value ("John Adam") is similar to other values in the corresponding attribute domain (passenger-name). In a holistic view of all (swapped) attributes, we propose to evaluate the likelihood of a swapping repaired tuple by studying its distances (similarity) to neighbors. The rationale of distance likelihood refers to the Poisson process of nearest neighbor appearance. The optimum repair problem is to find a swapping repair with the maximum likelihood on distances. Experiments over datasets with real-world misplaced attribute values demonstrate the effectiveness of our proposal in repairing misplacement.
Year
DOI
Venue
2020
10.1109/ICDE48307.2020.00068
2020 IEEE 36th International Conference on Data Engineering (ICDE)
Keywords
DocType
ISSN
passenger-name attribute,travel-document attribute,misplacement errors,novel swapping repair model,misplaced passenger-name,travel-document values,attribute domains,swapping repaired tuple,optimum repair problem,real-world misplaced attribute values,misplaced data,value Passport,in-attribute error repairing,minimum change criterion,distance likelihood rationale,Poisson process,nearest neighbor appearance
Conference
1063-6382
ISBN
Citations 
PageRank 
978-1-7281-2904-4
0
0.34
References 
Authors
14
4
Name
Order
Citations
PageRank
Yu Sun111.37
Shaoxu Song225931.50
Chen Wang339.53
Jianmin Wang42446156.05