A Fine‐Grained Distribution Approach for ETL Processes in Big Data Environments. - Citegraph

Paper Info

Title
A Fine‐Grained Distribution Approach for ETL Processes in Big Data Environments.

Abstract
Among the so-called “4Vs” (volume, velocity, variety, and veracity) that characterize the complexity of Big Data, this paper focuses on the issue of “Volume” in order to ensure good performance for Extracting-Transforming-Loading (ETL) processes. In this study, we propose a new fine-grained parallelization/distribution approach for populating the Data Warehouse (DW). Unlike prior approaches that distribute the ETL only at coarse-grained level of processing, our approach provides different ways of parallelization/distribution both at process, functionality and elementary functions levels. In our approach, an ETL process is described in terms of its core functionalities which can run on a cluster of computers according to the MapReduce (MR) paradigm. The novel approach allows thereby the distribution of the ETL process at three levels: the “process” level for coarse-grained distribution and the “functionality” and “elementary functions” levels for fine-grained distribution. Our performance analysis reveals that employing 25 to 38 parallel tasks enables the novel approach to speed up the ETL process by up to 33% with the improvement rate being linear.

Year	DOI	Venue
2017	10.1016/j.datak.2017.08.003	Data & Knowledge Engineering
Keywords	Field	DocType
Data Warehousing,ETL,Parallel and Distributed Processing,Big Data,MapReduce	Data warehouse,Data mining,Computer science,Elementary function,Big data,Database,Speedup	Journal
Volume	Issue	ISSN
111	1	0169-023X
Citations	PageRank	References
0	0.34	11
Authors
3

Authors (3 rows)

Cited by (0 rows)

References (11 rows)

Name	Order	Citations	PageRank
Mahfoud Bala	1	0	0.68
Omar Boussaid	2	312	46.88
Z. Alimazighi	3	49	18.28

1