Title | ||
---|---|---|
Nephele/PACTs: a programming model and execution framework for web-scale analytical processing |
Abstract | ||
---|---|---|
We present a parallel data processor centered around a programming model of so called Parallelization Contracts (PACTs) and the scalable parallel execution engine Nephele [18]. The PACT programming model is a generalization of the well-known map/reduce programming model, extending it with further second-order functions, as well as with Output Contracts that give guarantees about the behavior of a function. We describe methods to transform a PACT program into a data flow for Nephele, which executes its sequential building blocks in parallel and deals with communication, synchronization and fault tolerance. Our definition of PACTs allows to apply several types of optimizations on the data flow during the transformation. The system as a whole is designed to be as generic as (and compatible to) map/reduce systems, while overcoming several of their major weaknesses: 1) The functions map and reduce alone are not sufficient to express many data processing tasks both naturally and efficiently. 2) Map/reduce ties a program to a single fixed execution strategy, which is robust but highly suboptimal for many tasks. 3) Map/reduce makes no assumptions about the behavior of the functions. Hence, it offers only very limited optimization opportunities. With a set of examples and experiments, we illustrate how our system is able to naturally represent and efficiently execute several tasks that do not fit the map/reduce model well. |
Year | DOI | Venue |
---|---|---|
2010 | 10.1145/1807128.1807148 | SoCC |
Keywords | Field | DocType |
well-known map,pact program,web-scale analytical processing,parallel data processor,single fixed execution strategy,output contracts,scalable parallel execution engine,data flow,programming model,execution framework,parallelization contracts,pact programming model,cloud computing,second order,data processing,fault tolerant | Synchronization,Data processing,Programming paradigm,Computer science,Parallel computing,Data processing system,Real-time computing,Fault tolerance,Data flow diagram,Distributed computing,Cloud computing,Scalability | Conference |
Citations | PageRank | References |
147 | 6.11 | 18 |
Authors | ||
6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Dominic Battré | 1 | 257 | 20.40 |
Stephan Ewen | 2 | 602 | 23.70 |
Fabian Hueske | 3 | 489 | 20.81 |
Odej Kao | 4 | 1066 | 96.19 |
Volker Markl | 5 | 2245 | 182.37 |
Daniel Warneke | 6 | 601 | 27.20 |