Title
An ETL optimization framework using partitioning and parallelization
Abstract
Extract-Transform-Load (ETL) handles large amounts of data and manages workload through dataflows. ETL dataflows are widely regarded as complex and expensive operations in terms of time and system resources. In order to minimize the time and the resources required by ETL dataflows, this paper presents an optimization framework using partitioning and parallelization. The framework first partitions an ETL dataflow into multiple execution trees according to the characteristics of ETL constructs, then within an execution tree pipelined parallelism and shared cache are used to optimize the partitioned dataflow. Furthermore, multi-threading is used in component-based optimization. The experimental results show that the proposed framework can achieve 4.7 times faster than the ordinary ETL dataflows (without using the proposed partitioning and optimization methods), and is comparable to the similar ETL tools.
Year
DOI
Venue
2015
10.1145/2695664.2695846
SAC 2015: Symposium on Applied Computing Salamanca Spain April, 2015
Keywords
Field
DocType
Dataflow Partitioning, Optimization, Shared Cache, Execution Tree
Shared memory,Computer science,Workload,Parallel computing,Dataflow
Conference
ISBN
Citations 
PageRank 
978-1-4503-3196-8
3
0.37
References 
Authors
14
2
Name
Order
Citations
PageRank
Xiufeng Liu110814.69
Nadeem Iftikhar28011.50