Title
Using Graph Summarization for Join-Ahead Pruning in a Distributed RDF Engine
Abstract
The need for scalable and efficient RDF stores has seen a high demand recently. Many efficient systems, both centralized and distributed, have been proposed. Since a row-oriented output is required by SPARQL, most of the current systems rely on relational joins. One of the problems with relational joins, though, is a performance bottleneck imposed by the generation of large intermediate relations which could be avoided by using more accurate data and pruning statistics. To address this problem, recently several systems have been proposed that employ bisimulation-based graph summaries -- adopted from XML indexing -- over large RDF graphs in order to facilitate join-ahead pruning. In this paper, we discuss a different, locality-based, graph summarization approach for RDF data and highlight its utilization for join-ahead pruning in a distributed SPARQL engine. Based on our recently developed TriAD engine, we present a detailed comparison of processing techniques for these graph summaries over the synthetic LUBM benchmark.
Year
DOI
Venue
2014
10.1145/2630602.2630610
SWIM
Keywords
Field
DocType
algorithms,design,graphs and networks,experimentation,semantic networks,content analysis and indexing,measurement,world wide web,performance
Data mining,Bottleneck,Joins,Computer science,SPARQL,Bisimulation,RDF Schema,RDF,Pruning,Scalability
Conference
Citations 
PageRank 
References 
4
0.47
14
Authors
4
Name
Order
Citations
PageRank
Sairam Gurajada11187.83
Stephan Seufert240.47
Iris Miliaraki340.47
Martin Theobald4147472.06