Title
Automatic Improvement of Apache Spark Queries using Semantics-preserving Program Reduction.
Abstract
Apache Spark is a popular framework for large-scale data analytics. Unfortunately, Spark's performance can be difficult to optimise, since queries freely expressed in source code are not amenable to traditional optimisation techniques. This article describes Hylas, a tool for automatically optimising Spark queries embedded in source code via the application of semantics-preserving transformations. The transformation method is inspired by functional programming techniques of \"deforestation\", which eliminate intermediate data structures from a computation. This contrasts with approaches defined entirely within structured query formats such as Spark SQL. Hylas can identify certain computationally expensive operations and ensure that performing them creates no superfluous data structures. This optimisation leads to significant improvements in execution time, with over 10,000 times improvement observed in some cases.
Year
DOI
Venue
2016
10.1145/2908961.2931692
GECCO (Companion)
Keywords
Field
DocType
Apache Spark, Search-based Software Engineering, Program Transformation, Query Optimisation, Automatic Improvement Programming, Genetic Improvement
SQL,Data structure,Spark (mathematics),Program transformation,Functional programming,Data analysis,Source code,Computer science,Artificial intelligence,Machine learning,Search-based software engineering
Conference
Citations 
PageRank 
References 
4
0.42
13
Authors
4
Name
Order
Citations
PageRank
Zoltan A. Kocsis1193.72
John H. Drake28510.95
Douglas Carson340.42
Jerry Swan4193.05