Title
IncReStore: Incremental computation of mapreduce workflows
Abstract
Many applications in various industrial and research areas analyze large continuously evolving data. Big data analytics platforms such as MapReduce focus on distributed batch processing, and therefore, a query needs to be re-executed every time its input data evolve. In this paper, we present IncReStore, a system that incrementally computes queries on fast growing datasets by materializing query outputs and maintaining them. IncReStore runs in two modes: (1) Opportunistic IncReStore generates compensating queries on the fly during their execution to use previously materialized query outputs taking into account that data might have evolved; and (2) Active IncReStore automatically generates MapReduce jobs to update the materialized query outputs whenever the datasets that they depend on evolve. We have implemented IncReStore as an extension to Pig and Hadoop. Our experimental evaluation of IncRestore using the TPC-H benchmark shows significant speedups.
Year
DOI
Venue
2016
10.1109/ICDEW.2016.7495613
2016 IEEE 32nd International Conference on Data Engineering Workshops (ICDEW)
Field
DocType
Citations 
Query optimization,Web search query,Data mining,Metadata,Query language,Computer science,Sargable,Batch processing,Distributed database,Big data,Database
Conference
0
PageRank 
References 
Authors
0.34
16
3
Name
Order
Citations
PageRank
Ahmed Aziz Khalifa16912.04
Iman Elghandour2564.72
Nagwa M. El-Makky36311.48