Title
The VADA Architecture for Cost-Effective Data Wrangling.
Abstract
Data wrangling, the multi-faceted process by which the data required by an application is identified, extracted, cleaned and integrated, is often cumbersome and labor intensive. In this paper, we present an architecture that supports a complete data wrangling lifecycle, orchestrates components dynamically, builds on automation wherever possible, is informed by whatever data is available, refines automatically produced results in the light of feedback, takes into account the user's priorities, and supports data scientists with diverse skill sets. The architecture is demonstrated in practice for wrangling property sales and open government data.
Year
DOI
Venue
2017
10.1145/3035918.3058730
SIGMOD Conference
Keywords
Field
DocType
Data Wrangling
Data science,Data mining,Architecture,Computer science,Open government,Automation,Data wrangling,Database
Conference
Citations 
PageRank 
References 
4
0.40
10
Authors
11
Name
Order
Citations
PageRank
Nikolaos Konstantinou18810.73
Martin Koehler2568.05
Edward Abel3244.85
Cristina Civili451.77
Bernd Neumayr516316.83
Emanuel Sallinger67120.76
Alvaro A. A. Fernandes7143.65
Georg Gottlob895941103.48
John A. Keane969592.81
Leonid Libkin103446764.02
Norman W. Paton113059359.26