Title
Efficient processing of top-k joins in MapReduce
Abstract
Top-k join is an essential tool for data analysis, since it enables selective retrieval of the k best combined results that come from multiple different input datasets. In the context of Big Data, processing top-k joins over huge datasets requires a scalable platform, such as the widely popular MapReduce framework. However, such a solution does not necessarily imply efficient processing, due to inherent limitations related to MapReduce. In particular, these include lack of an early termination mechanism for accessing only subset of input data, as well as an appropriate load balancing mechanism tailored to the top-k join problem. Apart from these issues, a significant research problem is how to determine the subset of the inputs that is guaranteed to produce the correct top-k join result. In this paper, we address these challenges by proposing an algorithm for efficient top-k join processing in MapReduce. Our experimental evaluation clearly demonstrates the efficiency of our approach, which does not compromise its scalability nor any other salient feature of MapReduce processing.
Year
DOI
Venue
2016
10.1109/BigData.2016.7840649
2016 IEEE International Conference on Big Data (Big Data)
Keywords
Field
DocType
top-k join processing,data analysis,Big Data,early termination mechanism,load balancing mechanism,MapReduce processing
Load management,Data mining,Histogram,Joins,Algorithm design,Computer science,Load balancing (computing),Big data,Scalability,Distributed computing,Salient
Conference
ISBN
Citations 
PageRank 
978-1-4673-9006-4
1
0.35
References 
Authors
10
4
Name
Order
Citations
PageRank
Mei Saouk110.69
Christos Doulkeridis289955.91
Akrivi Vlachou375139.95
Kjetil Noervaag410.35