Distributed Top-K Join Queries Optimizing for RDF Datasets. - Citegraph

Paper Info

Title
Distributed Top-K Join Queries Optimizing for RDF Datasets.

Abstract
In recent years, the scale of RDF datasets is increasing rapidly, the query research on RDF datasets in the transitional centralized environment is unable to meet the increasing demand of data query field, especially the top-k query. Based on the Spark distributed computing system and the HBase distributed storage system, a novel method is proposed for top-k query. A top-k query plan STA Spark Threshold Algorithm is proposed to reduce the connection operation of RDF data. Furthermore, a better algorithm SSJA Spark Simple Join Algorithm is presented to reduce the sorting related operations for the intermediate data. A cache mechanism is also proposed to speed up the SSJA algorithm. The experimental results show that the SSJA algorithm performs better than the STA algorithm in term of the cost and applicability, and it can significantly improve the SSJA's performance by introducing the cache mechanism.

Year	DOI	Venue
2017	10.4018/IJWSR.2017070105	Int. J. Web Service Res.
Keywords	Field	DocType
Distributed Optimization, RDF Datasets, Spark, Top-k Query	Data mining,Spark (mathematics),Computer science,Cache,Distributed data store,Sort-merge join,Sorting,RDF Schema,RDF,Query plan	Journal
Volume	Issue	ISSN
14	3	1545-7362
Citations	PageRank	References
1	0.37	11
Authors
4

Authors (4 rows)

Cited by (1 rows)

References (11 rows)

Name	Order	Citations	PageRank
jinguang gu	1	46	13.50
Hao Dong	2	8	5.01
Zhao Liu	3	25	10.73
Fangfang Xu	4	1	1.05

1