Title
Extending Apache Spark with a Mediation Layer
Abstract
With the recent growth of data volumes in many disciplines of both industry and academia many new Big Data Management systems have emerged to provide scalable tools for efficient data storing, processing and analysis. However, most of these systems offer little support for efficiently integrating multiple external sources under a uniform schema and a single query access point, which greatly simplifies further analytics. In this work, we present Spark Mediator, a system that extends the logical data integration capabilities of Apache Spark. As a use case, we show the application of Spark Mediator to the integration of schizophrenia neuroimaging data and compare with previous data integration systems.
Year
DOI
Venue
2018
10.1145/3208352.3208354
SBD@SIGMOD
Field
DocType
ISBN
Data integration,Data science,Spark (mathematics),Computer science,Logical data model,Big data management,Mediation (Marxist theory and media studies),Analytics,Schema (psychology),Database,Scalability
Conference
978-1-4503-5779-1
Citations 
PageRank 
References 
0
0.34
19
Authors
3
Name
Order
Citations
PageRank
Dimitris Stripelis101.01
Chrysovalantis Anastasiou231.74
José Luis Ambite3958110.89