Automatic Improvement of Apache Spark Queries using Semantics-preserving Program Reduction. - Citegraph

Paper Info

Title
Automatic Improvement of Apache Spark Queries using Semantics-preserving Program Reduction.

Abstract
Apache Spark is a popular framework for large-scale data analytics. Unfortunately, Spark's performance can be difficult to optimise, since queries freely expressed in source code are not amenable to traditional optimisation techniques. This article describes Hylas, a tool for automatically optimising Spark queries embedded in source code via the application of semantics-preserving transformations. The transformation method is inspired by functional programming techniques of \"deforestation\", which eliminate intermediate data structures from a computation. This contrasts with approaches defined entirely within structured query formats such as Spark SQL. Hylas can identify certain computationally expensive operations and ensure that performing them creates no superfluous data structures. This optimisation leads to significant improvements in execution time, with over 10,000 times improvement observed in some cases.

Year	DOI	Venue
2016	10.1145/2908961.2931692	GECCO (Companion)
Keywords	Field	DocType
Apache Spark, Search-based Software Engineering, Program Transformation, Query Optimisation, Automatic Improvement Programming, Genetic Improvement	SQL,Data structure,Spark (mathematics),Program transformation,Functional programming,Data analysis,Source code,Computer science,Artificial intelligence,Machine learning,Search-based software engineering	Conference
Citations	PageRank	References
4	0.42	13
Authors
4

Authors (4 rows)

Cited by (4 rows)

References (13 rows)

Name	Order	Citations	PageRank
Zoltan A. Kocsis	1	19	3.72
John H. Drake	2	85	10.95
Douglas Carson	3	4	0.42
Jerry Swan	4	19	3.05

1