Title
CurrentClean: Spatio-Temporal Cleaning of Stale Data
Abstract
Data currency is imperative towards achieving up-to-date and accurate data analysis. Data is considered current if changes in real world entities are reflected in the database. When this does not occur, stale data arises. Identifying and repairing stale data goes beyond simply having timestamps. Individual entities each have their own update patterns in both space and time. These update patterns can be learned and predicted given available query logs. In this paper, we present CurrentClean, a probabilistic system for identifying and cleaning stale values. We introduce a spatio-temporal probabilistic model that captures the database update patterns to infer stale values, and propose a set of inference rules that model spatio-temporal update patterns commonly seen in real data. We recommend repairs to clean stale values by learning from past update values over cells. Our evaluation shows CurrentClean's effectiveness to identify stale values over real data, and achieves improved error detection and repair accuracy over state-of-the-art techniques.
Year
DOI
Venue
2019
10.1109/ICDE.2019.00024
2019 IEEE 35th International Conference on Data Engineering (ICDE)
Keywords
Field
DocType
Currencies,Maintenance engineering,Databases,Probabilistic logic,Cleaning,Resists,Correlation
Data mining,Computer science,Error detection and correction,Statistical model,Timestamp,Probabilistic logic,Rule of inference,Maintenance engineering
Conference
ISSN
ISBN
Citations 
1084-4627
978-1-5386-7474-1
0
PageRank 
References 
Authors
0.34
0
3
Name
Order
Citations
PageRank
Z. Zheng18912.56
Mostafa Milani202.03
Fei Chiang325619.02