Title
Towards Automatic Data Format Transformations: Data Wrangling at Scale
Abstract
Data wrangling is the process whereby data are cleaned and integrated for analysis. Data wrangling, even with tool support, is typically a labour intensive process. One aspect of data wrangling involves carrying out format transformations on attribute values, for example so that names or phone numbers are represented consistently. Recent research has developed techniques for synthesizing format transformation programs from examples of the source and target representations. This is valuable, but still requires a user to provide suitable examples, something that may be challenging in applications in which there are huge datasets or numerous data sources. In this paper, we investigate the automatic discovery of examples that can be used to synthesize format transformation programs. In particular, we propose two approaches to identifying candidate data examples and validating the transformations that are synthesized from them. The approaches are evaluated empirically using datasets from open government data.
Year
DOI
Venue
2017
10.1093/comjnl/bxy118
COMPUTER JOURNAL
Keywords
Field
DocType
format transformations,data wrangling,program synthesis
Data format,Computer science,Theoretical computer science,Data wrangling,Multimedia
Conference
Volume
Issue
ISSN
62
7
0010-4620
Citations 
PageRank 
References 
2
0.37
0
Authors
4
Name
Order
Citations
PageRank
Alex Bogatu131.75
Norman W. Paton23059359.26
Alvaro A. A. Fernandes3143.65
Martin Koehler4568.05