Towards efficient join processing over large RDF graph using mapreduce - Citegraph

Paper Info

Title
Towards efficient join processing over large RDF graph using mapreduce

Abstract
Existing solutions for answering SPARQL queries in a shared-nothing environment using MapReduce failed to fully explore the substantial scalability and parallelism of the computing framework. In this paper, we propose a cost model based RDF join processing solution using MapReduce to minimize the query responding time as much as possible. After transforming a SPARQL query into a sequence of MapReduce jobs, we propose a novel index structure, called All Possible Join tree (APJ-tree), to reduce the searching space for the optimal execution plan of a set of MapReduce jobs. To speed up the join processing, we employ hybrid join and bloom filter for performance optimization. Extensive experiments on real data sets proved the effectiveness of our cost model. Our solution has as much as an order of magnitude time saving compared with the state of art solutions.

Year	DOI	Venue
2012	10.1007/978-3-642-31235-9_16	Lecture Notes in Computer Science
Keywords	Field	DocType
processing solution,sparql query,bloom filter,possible join tree,computing framework,magnitude time,art solution,cost model,large rdf graph,extensive experiment,mapreduce job	Hash join,Bloom filter,Data mining,Recursive join,Computer science,Theoretical computer science,Sort-merge join,SPARQL,RDF,Database,Speedup,Scalability	Conference
Volume	ISSN	Citations
7338	0302-9743	14
PageRank	References	Authors
0.54	14	3

Authors (3 rows)

Cited by (14 rows)

References (14 rows)

Name	Order	Citations	PageRank
Xiaofei Zhang	1	71	2.87
Lei Chen	2	6239	395.84
Min WANG	3	1662	192.58

1