Title
A Fine‐Grained Distribution Approach for ETL Processes in Big Data Environments.
Abstract
Among the so-called “4Vs” (volume, velocity, variety, and veracity) that characterize the complexity of Big Data, this paper focuses on the issue of “Volume” in order to ensure good performance for Extracting-Transforming-Loading (ETL) processes. In this study, we propose a new fine-grained parallelization/distribution approach for populating the Data Warehouse (DW). Unlike prior approaches that distribute the ETL only at coarse-grained level of processing, our approach provides different ways of parallelization/distribution both at process, functionality and elementary functions levels. In our approach, an ETL process is described in terms of its core functionalities which can run on a cluster of computers according to the MapReduce (MR) paradigm. The novel approach allows thereby the distribution of the ETL process at three levels: the “process” level for coarse-grained distribution and the “functionality” and “elementary functions” levels for fine-grained distribution. Our performance analysis reveals that employing 25 to 38 parallel tasks enables the novel approach to speed up the ETL process by up to 33% with the improvement rate being linear.
Year
DOI
Venue
2017
10.1016/j.datak.2017.08.003
Data & Knowledge Engineering
Keywords
Field
DocType
Data Warehousing,ETL,Parallel and Distributed Processing,Big Data,MapReduce
Data warehouse,Data mining,Computer science,Elementary function,Big data,Database,Speedup
Journal
Volume
Issue
ISSN
111
1
0169-023X
Citations 
PageRank 
References 
0
0.34
11
Authors
3
Name
Order
Citations
PageRank
Mahfoud Bala100.68
Omar Boussaid231246.88
Z. Alimazighi34918.28