Scalable Distributed Top-k Join Queries in Topic-Based Pub/Sub Systems - Citegraph

Paper Info

Title
Scalable Distributed Top-k Join Queries in Topic-Based Pub/Sub Systems

Abstract
In this paper, we provide a novel approach that enables the execution of top-k join queries over sliding windows in a way that reduces the amount of data that need to be analyzed by the stream processing operators. The main idea is that brokers individually invoke the query on their received messages and forward the top-k results to a stream processing operator that performs the merging of the results and provides to the end-user the final top-k results. Moreover, our system exploits the Bayesian Optimization technique to determine automatically the number of top-k results that should be provided by each broker. Our approach has been developed in the Kappa architecture that exploits topic-based scalable publish/subscribe (pub/sub) systems like Apache Kafka to efficiently forward the high volume of incoming messages to distributed processing systems (i.e., Apache Spark or Apache Flink) that perform the batch and stream analytics operations. Our detailed experimental evaluation on our local cluster illustrates that we can efficiently execute top-k join queries on our system with high accuracy and low latency.

Year	DOI	Venue
2018	10.1109/BigData.2018.8621949	2018 IEEE International Conference on Big Data (Big Data)
Keywords	Field	DocType
pub/sub systems,stream processing,Kappa architecture,top-k joins	Data mining,Spark (mathematics),Computer science,Bayesian optimization,Exploit,Operator (computer programming),Latency (engineering),Analytics,Stream processing,Distributed computing,Scalability	Conference
ISSN	ISBN	Citations
2639-1589	978-1-5386-5036-3	0
PageRank	References	Authors
0.34	0	3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Nikos Zacheilas	1	79	9.40
Dimitris Dedousis	2	0	0.68
Vana Kalogeraki	3	1686	124.40

1