Data placement strategies that speed-up distributed graph query processing - Citegraph

Paper Info

Title
Data placement strategies that speed-up distributed graph query processing

Abstract
We consider the problem how to optimize the data distribution to improve the query performance in distributed RDF stores running on compute node clusters. When hash-based data distribution strategies are used, the query workload tends to be equally balanced among all compute nodes whereas graph-clustering-based approaches reduce the number of transferred intermediate results. Our hypothesis is that data distribution strategies that collocate entities in small sets of closely connected data items may be able to combine the advantages of both strategies. To investigate this hypothesis, we analyze two such data distribution strategies: 1. Overpartitioned minimal edge-cut cover. 2. Our novel molecule hash cover. Our analysis substantiates our hypothesis by explaining the causes for their good performance. Both strategies reduce query execution time on our set of test queries (between 5% and 98%). While overpartitioned minimal edge-cut cover fares best, when it can be computed, it may lack scalability for large datasets. Our novel molecule hash cover combines scalability and major improvements of query execution time against various baseline strategies.

Year	DOI	Venue
2020	10.1145/3391274.3393633	SIGMOD/PODS '20: International Conference on Management of Data Portland Oregon June, 2020
DocType	ISBN	Citations
Conference	978-1-4503-7974-8	0
PageRank	References	Authors
0.34	6	3

Authors (3 rows)

Cited by (0 rows)

References (6 rows)

Name	Order	Citations	PageRank
Daniel Dominik Janke	1	6	3.20
Steffen Staab	2	6658	593.89
Martin Leinberger	3	23	5.94

1