Title
SPARQL Graph Pattern Processing with Apache Spark.
Abstract
A common way to achieve scalability for processing SPARQL queries is to choose MapReduce frameworks like Hadoop or Spark. Processing basic graph pattern (BGP) expressions generating large join plans over distributed data partitions is a major challenge in these frameworks. In this article, we study the use of two distributed join algorithms, partitioned join and broadcast join, for the evaluation of BGP expressions on top of Apache Spark. We compare five possible implementation and illustrate the importance of cautiously choosing the physical data storage layer and of the possibility to use both join algorithms to efficiently take account of existing data partitioning schemes. Our experimentations with different SPARQL benchmarks over real-world and synthetic workloads emphasize that hybrid join plans introduce more flexibility and often achieve better performance than single kind join plans.
Year
DOI
Venue
2017
10.1145/3078447.3078448
GRADES@SIGMOD/PODS
Field
DocType
Citations 
Hash join,Data mining,Broadcasting,Spark (mathematics),Recursive join,Expression (mathematics),Computer science,SPARQL,Sort-merge join,Theoretical computer science,Database,Scalability
Conference
1
PageRank 
References 
Authors
0.36
13
3
Name
Order
Citations
PageRank
Hubert Naacke112825.41
Bernd Amann242559.99
Olivier Curé310.36