Abstract | ||
---|---|---|
Top-k join is an essential tool for data analysis, since it enables selective retrieval of the k best combined results that come from multiple different input datasets. In the context of Big Data, processing top-k joins over huge datasets requires a scalable platform, such as the widely popular MapReduce framework. However, such a solution does not necessarily imply efficient processing, due to inherent limitations related to MapReduce. In particular, these include lack of an early termination mechanism for accessing only subset of input data, as well as an appropriate load balancing mechanism tailored to the top-k join problem. Apart from these issues, a significant research problem is how to determine the subset of the inputs that is guaranteed to produce the correct top-k join result. In this paper, we address these challenges by proposing an algorithm for efficient top-k join processing in MapReduce. Our experimental evaluation clearly demonstrates the efficiency of our approach, which does not compromise its scalability nor any other salient feature of MapReduce processing. |
Year | DOI | Venue |
---|---|---|
2016 | 10.1109/BigData.2016.7840649 | 2016 IEEE International Conference on Big Data (Big Data) |
Keywords | Field | DocType |
top-k join processing,data analysis,Big Data,early termination mechanism,load balancing mechanism,MapReduce processing | Load management,Data mining,Histogram,Joins,Algorithm design,Computer science,Load balancing (computing),Big data,Scalability,Distributed computing,Salient | Conference |
ISBN | Citations | PageRank |
978-1-4673-9006-4 | 1 | 0.35 |
References | Authors | |
10 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Mei Saouk | 1 | 1 | 0.69 |
Christos Doulkeridis | 2 | 899 | 55.91 |
Akrivi Vlachou | 3 | 751 | 39.95 |
Kjetil Noervaag | 4 | 1 | 0.35 |