Dynamic speculative optimizations for SQL compilation in Apache Spark - Citegraph

Paper Info

Title
Dynamic speculative optimizations for SQL compilation in Apache Spark

Abstract
AbstractBig-data systems have gained significant momentum, and Apache Spark is becoming a de-facto standard for modern data analytics. Spark relies on SQL query compilation to optimize the execution performance of analytical workloads on a variety of data sources. Despite its scalable architecture, Spark's SQL code generation suffers from significant runtime overheads related to data access and de-serialization. Such performance penalty can be significant, especially when applications operate on human-readable data formats such as CSV or JSON.In this paper we present a new approach to query compilation that overcomes these limitations by relying on run-time profiling and dynamic code generation. Our new SQL compiler for Spark produces highly-efficient machine code, leading to speedups of up to 4.4x on the TPC-H benchmark with textual-form data formats such as CSV or JSON.

Year	DOI	Venue
2020	10.14778/3377369.3377382	Hosted Content
Field	DocType	Volume
SQL,Spark (mathematics),Computer science,Database	Journal	13
Issue	ISSN	Citations
5	2150-8097	0
PageRank	References	Authors
0.34	0	3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Filippo Schiavio	1	0	0.34
Daniele Bonetta	2	81	12.87
Walter Binder	3	1077	92.58

1