Title
RHEEMix in the Data Jungle - A Cross-Platform Query Optimizer -.
Abstract
In pursuit of efficient and scalable data analytics, the insight that size does not fit all has given rise to a plethora of specialized data processing platforms and todayu0027s complex data analytics are moving beyond the limits of a single platform. To cope with these new requirements, we present a cross-platform optimizer that allocates the subtasks of data analytic tasks to the most suitable platforms. Our main contributions are: (i)~a mechanism based on graph transformations to explore alternative execution strategies; (ii)~a novel graph-based approach to efficiently plan data movement among subtasks and platforms; and (iii)~an efficient plan enumeration algorithm, based on a novel enumeration algebra. We extensively evaluate our optimizer under diverse real tasks. The results show that our optimizer is capable of selecting the most efficient platform combination for a given task, freeing data analysts from the need to choose and orchestrate platforms. In particular, our optimizer allows certain tasks to run more than one order of magnitude faster than on state-of-the-art platforms, such as Spark.
Year
Venue
Field
2018
arXiv: Databases
Query optimization,Data mining,Data processing,Spark (mathematics),Data analysis,Computer science,Complex data type,Cross-platform,Analytics,Scalability,Distributed computing
DocType
Volume
Citations 
Journal
abs/1805.03533
1
PageRank 
References 
Authors
0.35
24
6
Name
Order
Citations
PageRank
Sebastian Kruse1518.03
Zoi Kaoudi221518.55
Jorge-arnulfo Quiané-ruiz398661.02
Sanjay Chawla41372105.09
Felix Naumann51900174.92
Bertty Contreras610.35