Title
Towards reliable interactive data cleaning: a user survey and recommendations.
Abstract
Data cleaning is frequently an iterative process tailored to the requirements of a specific analysis task. The design and implementation of iterative data cleaning tools presents novel challenges, both technical and organizational, to the community. In this paper, we present results from a user survey (N = 29) of data analysts and infrastructure engineers from industry and academia. We highlight three important themes: (1) the iterative nature of data cleaning, (2) the lack of rigor in evaluating the correctness of data cleaning, and (3) the disconnect between the analysts who query the data and the infrastructure engineers who design the cleaning pipelines. We conclude by presenting a number of recommendations for future work in which we envision an interactive data cleaning system that accounts for the observed challenges.
Year
DOI
Venue
2016
10.1145/2939502.2939511
HILDA@SIGMOD
Field
DocType
Citations 
Entity linking,Online learning,Data science,Data mining,Iterative and incremental development,Computer science,Correctness,Database
Conference
12
PageRank 
References 
Authors
0.63
27
4
Name
Order
Citations
PageRank
S. Krishnan139136.25
Daniel Haas21005.74
Michael J. Franklin3174231681.10
Eugene Wu469145.52