Title
Online aggregation and continuous query support in MapReduce
Abstract
MapReduce is a popular framework for data-intensive distributed computing of batch jobs. To simplify fault tolerance, the output of each MapReduce task and job is materialized to disk before it is consumed. In this demonstration, we describe a modified MapReduce architecture that allows data to be pipelined between operators. This extends the MapReduce programming model beyond batch processing, and can reduce completion times and improve system utilization for batch jobs as well. We demonstrate a modified version of the Hadoop MapReduce framework that supports online aggregation, which allows users to see "early returns" from a job as it is being computed. Our Hadoop Online Prototype (HOP) also supports continuous queries, which enable MapReduce programs to be written for applications such as event monitoring and stream processing. HOP retains the fault tolerance properties of Hadoop, and can run unmodified user-defined MapReduce programs.
Year
DOI
Venue
2010
10.1145/1807167.1807295
SIGMOD Conference
Keywords
Field
DocType
hadoop online prototype,batch job,hadoop mapreduce framework,modified mapreduce architecture,online aggregation,mapreduce programming model,batch processing,mapreduce task,continuous query support,fault tolerance property,fault tolerance,mapreduce program,batch process,programming model,fault tolerant,stream processing,distributed computing
Event monitoring,Architecture,Programming paradigm,Computer science,Parallel computing,Fault tolerance,Batch processing,Operator (computer programming),Online aggregation,Stream processing,Database,Distributed computing
Conference
Citations 
PageRank 
References 
66
2.22
9
Authors
8
Name
Order
Citations
PageRank
Tyson Condie1116264.84
Neil Conway245821.46
Peter Alvaro346328.96
Joseph M. Hellerstein4140931651.14
John Gerth521119.77
Justin Talbot631215.36
Khaled Elmeleegy790547.10
Russell Sears8179985.12