Spark SQL: Relational Data Processing in Spark - Citegraph

Paper Info

Title
Spark SQL: Relational Data Processing in Spark

Abstract
Spark SQL is a new module in Apache Spark that integrates relational processing with Spark's functional programming API. Built on our experience with Shark, Spark SQL lets Spark programmers leverage the benefits of relational processing (e.g. declarative queries and optimized storage), and lets SQL users call complex analytics libraries in Spark (e.g. machine learning). Compared to previous systems, Spark SQL makes two main additions. First, it offers much tighter integration between relational and procedural processing, through a declarative DataFrame API that integrates with procedural Spark code. Second, it includes a highly extensible optimizer, Catalyst, built using features of the Scala programming language, that makes it easy to add composable rules, control code generation, and define extension points. Using Catalyst, we have built a variety of features (e.g. schema inference for JSON, machine learning types, and query federation to external databases) tailored for the complex needs of modern data analysis. We see Spark SQL as an evolution of both SQL-on-Spark and of Spark itself, offering richer APIs and optimizations while keeping the benefits of the Spark programming model.

Year	DOI	Venue
2015	10.1145/2723372.2742797	ACM SIGMOD Conference
Keywords	Field	DocType
Databases,Data Warehouse,Machine Learning,Spark,Hadoop	SQL,Data mining,Programming language,Scala,Spark (mathematics),Functional programming,Relational database,Programming paradigm,Computer science,Code generation,JSON,Database	Conference
Citations	PageRank	References
307	9.13	23
Authors
11

Search Limit

100307

Authors (11 rows)

Cited by (100 rows)

References (23 rows)

Name	Order	Citations	PageRank
Michael Armbrust	1	2434	109.69
Reynold Xin	2	2171	81.33
Cheng Lian	3	312	9.99
Yin Huai	4	579	21.77
davies liu	5	584	19.25
Joseph K. Bradley	6	668	25.59
Xiangrui Meng	7	1080	40.90
Tomer Kaftan	8	316	9.98
Michael J. Franklin	9	17423	1681.10
Ali Ghodsi	10	3306	156.01
Matei Zaharia	11	9101	407.89

1