Title
Scalable Distributed Top-k Join Queries in Topic-Based Pub/Sub Systems
Abstract
In this paper, we provide a novel approach that enables the execution of top-k join queries over sliding windows in a way that reduces the amount of data that need to be analyzed by the stream processing operators. The main idea is that brokers individually invoke the query on their received messages and forward the top-k results to a stream processing operator that performs the merging of the results and provides to the end-user the final top-k results. Moreover, our system exploits the Bayesian Optimization technique to determine automatically the number of top-k results that should be provided by each broker. Our approach has been developed in the Kappa architecture that exploits topic-based scalable publish/subscribe (pub/sub) systems like Apache Kafka to efficiently forward the high volume of incoming messages to distributed processing systems (i.e., Apache Spark or Apache Flink) that perform the batch and stream analytics operations. Our detailed experimental evaluation on our local cluster illustrates that we can efficiently execute top-k join queries on our system with high accuracy and low latency.
Year
DOI
Venue
2018
10.1109/BigData.2018.8621949
2018 IEEE International Conference on Big Data (Big Data)
Keywords
Field
DocType
pub/sub systems,stream processing,Kappa architecture,top-k joins
Data mining,Spark (mathematics),Computer science,Bayesian optimization,Exploit,Operator (computer programming),Latency (engineering),Analytics,Stream processing,Distributed computing,Scalability
Conference
ISSN
ISBN
Citations 
2639-1589
978-1-5386-5036-3
0
PageRank 
References 
Authors
0.34
0
3
Name
Order
Citations
PageRank
Nikos Zacheilas1799.40
Dimitris Dedousis200.68
Vana Kalogeraki31686124.40