Title
Efficiently support MapReduce-like computation models inside parallel DBMS
Abstract
While parallel DBMSs do support large scale parallel query processing on partitioned data, the reach of more general applications relies on User Defined Functions (UDFs). However, the existent UDF technology is insufficient both conceptually and practically. A UDF is not a relation-in, relation-out operator, which restricts its ability to model complex applications defined on a set of tuples rather than on a single one, and to be composed with other relational operators in a query. Further, to interact with the query execution efficiently, a UDF must be coded with complex interactions with DBMS internal data structures and system calls which is often beyond the expertise of an analytics application developer. To solve these problems, we start with wrapping general applications with Relation Valued Functions (RVFs); then based on the notion of invocation patterns, we provide focused system support for efficiently integrating RVF execution into the query processing pipeline. We further distinguish the system responsibility and the user responsibility in RVF development, by separating an RVF into the RVF-Shell for dealing with system interaction, and the user-function for pure application logic, such that the RVF-Shell can be constructed in terms of high-level APIs. These mechanisms enable us to solve the essential problems in supporting MapReduce and other analytics computation models inside a parallel database engine: modeling complex applications, integrating them into query processing, and shielding analytics developers from DBMS internal details. Prototyped on a commercial and proprietary parallel database engine, our experience reveals the practical value of the proposed approaches.
Year
DOI
Venue
2009
10.1145/1620432.1620438
IDEAS
Keywords
Field
DocType
mapreduce-like computation model,proprietary parallel database engine,query processing pipeline,parallel database engine,complex application,query execution,general application,parallel dbmss,system call,large scale parallel query,query processing,computer model,value function,data structure,application development,user defined function
Query optimization,Data structure,Data mining,Programming language,Computer science,Tuple,Parallel database,Sargable,User-defined function,Relational operator,Analytics,Database
Conference
Citations 
PageRank 
References 
8
0.65
17
Authors
6
Name
Order
Citations
PageRank
Qiming Chen12010233.16
Andy Therber280.65
Meichun Hsu33437778.34
Hans Zeller480.65
Bin Zhang516533.92
Ren Wu69217.28