Abstract | ||
---|---|---|
The appeal of MapReduce has spawned a family of systems that implement or extend it. In order to enable parallel collection processing with User-Defined Functions (UDFs), these systems expose extensions of the MapReduce programming model as library-based dataflow APIs that are tightly coupled to their underlying runtime engine. Expressing data analysis algorithms with complex data and control flow structure using such APIs reveals a number of limitations that impede programmer's productivity. In this paper we show that the design of data analysis languages and APIs from a runtime engine point of view bloats the APIs with low-level primitives and affects programmer's productivity. Instead, we argue that an approach based on deeply embedding the APIs in a host language can address the shortcomings of current data analysis languages. To demonstrate this, we propose a language for complex data analysis embedded in Scala, which (i) allows for declarative specification of dataflows and (ii) hides the notion of data-parallelism and distributed runtime behind a suitable intermediate representation. We describe a compiler pipeline that facilitates efficient data-parallel processing without imposing runtime engine-bound syntactic or semantic restrictions on the structure of the input programs. We present a series of experiments with two state-of-the-art systems that demonstrate the optimization potential of our approach. |
Year | DOI | Venue |
---|---|---|
2015 | 10.1145/2723372.2750543 | SIGMOD Record |
Field | DocType | Volume |
Scala,Implicit parallelism,Programming language,Programmer,Programming paradigm,Computer science,Parallel computing,Control flow,Complex data type,Compiler,Dataflow,Database | Conference | 45 |
Issue | Citations | PageRank |
1 | 24 | 0.87 |
References | Authors | |
28 | 8 |
Name | Order | Citations | PageRank |
---|---|---|---|
Alexander Alexandrov | 1 | 326 | 16.59 |
Andreas Kunft | 2 | 34 | 3.39 |
Asterios Katsifodimos | 3 | 226 | 18.53 |
Felix Schüler | 4 | 24 | 0.87 |
Lauritz Thamsen | 5 | 43 | 9.26 |
Odej Kao | 6 | 1066 | 96.19 |
Tobias Herb | 7 | 28 | 2.18 |
Volker Markl | 8 | 2245 | 182.37 |