Abstract | ||
---|---|---|
AbstractBig-data systems have gained significant momentum, and Apache Spark is becoming a de-facto standard for modern data analytics. Spark relies on SQL query compilation to optimize the execution performance of analytical workloads on a variety of data sources. Despite its scalable architecture, Spark's SQL code generation suffers from significant runtime overheads related to data access and de-serialization. Such performance penalty can be significant, especially when applications operate on human-readable data formats such as CSV or JSON.In this paper we present a new approach to query compilation that overcomes these limitations by relying on run-time profiling and dynamic code generation. Our new SQL compiler for Spark produces highly-efficient machine code, leading to speedups of up to 4.4x on the TPC-H benchmark with textual-form data formats such as CSV or JSON. |
Year | DOI | Venue |
---|---|---|
2020 | 10.14778/3377369.3377382 | Hosted Content |
Field | DocType | Volume |
SQL,Spark (mathematics),Computer science,Database | Journal | 13 |
Issue | ISSN | Citations |
5 | 2150-8097 | 0 |
PageRank | References | Authors |
0.34 | 0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Filippo Schiavio | 1 | 0 | 0.34 |
Daniele Bonetta | 2 | 81 | 12.87 |
Walter Binder | 3 | 1077 | 92.58 |