SPARQL Graph Pattern Processing with Apache Spark. - Citegraph

Paper Info

Title
SPARQL Graph Pattern Processing with Apache Spark.

Abstract
A common way to achieve scalability for processing SPARQL queries is to choose MapReduce frameworks like Hadoop or Spark. Processing basic graph pattern (BGP) expressions generating large join plans over distributed data partitions is a major challenge in these frameworks. In this article, we study the use of two distributed join algorithms, partitioned join and broadcast join, for the evaluation of BGP expressions on top of Apache Spark. We compare five possible implementation and illustrate the importance of cautiously choosing the physical data storage layer and of the possibility to use both join algorithms to efficiently take account of existing data partitioning schemes. Our experimentations with different SPARQL benchmarks over real-world and synthetic workloads emphasize that hybrid join plans introduce more flexibility and often achieve better performance than single kind join plans.

Year	DOI	Venue
2017	10.1145/3078447.3078448	GRADES@SIGMOD/PODS
Field	DocType	Citations
Hash join,Data mining,Broadcasting,Spark (mathematics),Recursive join,Expression (mathematics),Computer science,SPARQL,Sort-merge join,Theoretical computer science,Database,Scalability	Conference	1
PageRank	References	Authors
0.36	13	3

Authors (3 rows)

Cited by (1 rows)

References (13 rows)

Name	Order	Citations	PageRank
Hubert Naacke	1	128	25.41
Bernd Amann	2	425	59.99
Olivier Curé	3	1	0.36

1