Title
Towards efficient join processing over large RDF graph using mapreduce
Abstract
Existing solutions for answering SPARQL queries in a shared-nothing environment using MapReduce failed to fully explore the substantial scalability and parallelism of the computing framework. In this paper, we propose a cost model based RDF join processing solution using MapReduce to minimize the query responding time as much as possible. After transforming a SPARQL query into a sequence of MapReduce jobs, we propose a novel index structure, called All Possible Join tree (APJ-tree), to reduce the searching space for the optimal execution plan of a set of MapReduce jobs. To speed up the join processing, we employ hybrid join and bloom filter for performance optimization. Extensive experiments on real data sets proved the effectiveness of our cost model. Our solution has as much as an order of magnitude time saving compared with the state of art solutions.
Year
DOI
Venue
2012
10.1007/978-3-642-31235-9_16
Lecture Notes in Computer Science
Keywords
Field
DocType
processing solution,sparql query,bloom filter,possible join tree,computing framework,magnitude time,art solution,cost model,large rdf graph,extensive experiment,mapreduce job
Hash join,Bloom filter,Data mining,Recursive join,Computer science,Theoretical computer science,Sort-merge join,SPARQL,RDF,Database,Speedup,Scalability
Conference
Volume
ISSN
Citations 
7338
0302-9743
14
PageRank 
References 
Authors
0.54
14
3
Name
Order
Citations
PageRank
Xiaofei Zhang1712.87
Lei Chen26239395.84
Min WANG31662192.58