Title
Dynamic speculative optimizations for SQL compilation in Apache Spark
Abstract
AbstractBig-data systems have gained significant momentum, and Apache Spark is becoming a de-facto standard for modern data analytics. Spark relies on SQL query compilation to optimize the execution performance of analytical workloads on a variety of data sources. Despite its scalable architecture, Spark's SQL code generation suffers from significant runtime overheads related to data access and de-serialization. Such performance penalty can be significant, especially when applications operate on human-readable data formats such as CSV or JSON.In this paper we present a new approach to query compilation that overcomes these limitations by relying on run-time profiling and dynamic code generation. Our new SQL compiler for Spark produces highly-efficient machine code, leading to speedups of up to 4.4x on the TPC-H benchmark with textual-form data formats such as CSV or JSON.
Year
DOI
Venue
2020
10.14778/3377369.3377382
Hosted Content
Field
DocType
Volume
SQL,Spark (mathematics),Computer science,Database
Journal
13
Issue
ISSN
Citations 
5
2150-8097
0
PageRank 
References 
Authors
0.34
0
3
Name
Order
Citations
PageRank
Filippo Schiavio100.34
Daniele Bonetta28112.87
Walter Binder3107792.58