Title
Discovering and merging related analytic datasets
Abstract
The production of analytic datasets is a significant big data trend and has gone well beyond the scope of traditional IT-governed dataset development. Analytic datasets are now created by data scientists and data analysts using big data frameworks and agile data preparation tools. However, despite the profusion of available datasets, it remains quite difficult for a data analyst to start from a dataset at hand and customize it with additional attributes coming from other existing datasets. This article describes a model and algorithms that exploit automatically extracted and user-defined semantic relationships for extending analytic datasets with new atomic or aggregated attribute values. Our framework is implemented as a REST service in SAP HANA and includes a careful theoretical analysis and practical solutions for several complex data quality issues.
Year
DOI
Venue
2020
10.1016/j.is.2020.101495
Information Systems
Keywords
DocType
Volume
Schema augmentation,Schema complement,Data quality,SAP HANA
Journal
91
ISSN
Citations 
PageRank 
0306-4379
0
0.34
References 
Authors
0
4
Name
Order
Citations
PageRank
Rutian Liu100.34
Eric Simon210018.09
Bernd Amann342559.99
Stéphane Gançarski417220.55