Abstract | ||
---|---|---|
In this paper, we provide a novel approach that enables the execution of top-k join queries over sliding windows in a way that reduces the amount of data that need to be analyzed by the stream processing operators. The main idea is that brokers individually invoke the query on their received messages and forward the top-k results to a stream processing operator that performs the merging of the results and provides to the end-user the final top-k results. Moreover, our system exploits the Bayesian Optimization technique to determine automatically the number of top-k results that should be provided by each broker. Our approach has been developed in the Kappa architecture that exploits topic-based scalable publish/subscribe (pub/sub) systems like Apache Kafka to efficiently forward the high volume of incoming messages to distributed processing systems (i.e., Apache Spark or Apache Flink) that perform the batch and stream analytics operations. Our detailed experimental evaluation on our local cluster illustrates that we can efficiently execute top-k join queries on our system with high accuracy and low latency. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1109/BigData.2018.8621949 | 2018 IEEE International Conference on Big Data (Big Data) |
Keywords | Field | DocType |
pub/sub systems,stream processing,Kappa architecture,top-k joins | Data mining,Spark (mathematics),Computer science,Bayesian optimization,Exploit,Operator (computer programming),Latency (engineering),Analytics,Stream processing,Distributed computing,Scalability | Conference |
ISSN | ISBN | Citations |
2639-1589 | 978-1-5386-5036-3 | 0 |
PageRank | References | Authors |
0.34 | 0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Nikos Zacheilas | 1 | 79 | 9.40 |
Dimitris Dedousis | 2 | 0 | 0.68 |
Vana Kalogeraki | 3 | 1686 | 124.40 |