Title | ||
---|---|---|
Automatic Improvement of Apache Spark Queries using Semantics-preserving Program Reduction. |
Abstract | ||
---|---|---|
Apache Spark is a popular framework for large-scale data analytics. Unfortunately, Spark's performance can be difficult to optimise, since queries freely expressed in source code are not amenable to traditional optimisation techniques. This article describes Hylas, a tool for automatically optimising Spark queries embedded in source code via the application of semantics-preserving transformations. The transformation method is inspired by functional programming techniques of \"deforestation\", which eliminate intermediate data structures from a computation. This contrasts with approaches defined entirely within structured query formats such as Spark SQL. Hylas can identify certain computationally expensive operations and ensure that performing them creates no superfluous data structures. This optimisation leads to significant improvements in execution time, with over 10,000 times improvement observed in some cases. |
Year | DOI | Venue |
---|---|---|
2016 | 10.1145/2908961.2931692 | GECCO (Companion) |
Keywords | Field | DocType |
Apache Spark, Search-based Software Engineering, Program Transformation, Query Optimisation, Automatic Improvement Programming, Genetic Improvement | SQL,Data structure,Spark (mathematics),Program transformation,Functional programming,Data analysis,Source code,Computer science,Artificial intelligence,Machine learning,Search-based software engineering | Conference |
Citations | PageRank | References |
4 | 0.42 | 13 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Zoltan A. Kocsis | 1 | 19 | 3.72 |
John H. Drake | 2 | 85 | 10.95 |
Douglas Carson | 3 | 4 | 0.42 |
Jerry Swan | 4 | 19 | 3.05 |