Efficient processing of top-k joins in MapReduce - Citegraph

Paper Info

Title
Efficient processing of top-k joins in MapReduce

Abstract
Top-k join is an essential tool for data analysis, since it enables selective retrieval of the k best combined results that come from multiple different input datasets. In the context of Big Data, processing top-k joins over huge datasets requires a scalable platform, such as the widely popular MapReduce framework. However, such a solution does not necessarily imply efficient processing, due to inherent limitations related to MapReduce. In particular, these include lack of an early termination mechanism for accessing only subset of input data, as well as an appropriate load balancing mechanism tailored to the top-k join problem. Apart from these issues, a significant research problem is how to determine the subset of the inputs that is guaranteed to produce the correct top-k join result. In this paper, we address these challenges by proposing an algorithm for efficient top-k join processing in MapReduce. Our experimental evaluation clearly demonstrates the efficiency of our approach, which does not compromise its scalability nor any other salient feature of MapReduce processing.

Year	DOI	Venue
2016	10.1109/BigData.2016.7840649	2016 IEEE International Conference on Big Data (Big Data)
Keywords	Field	DocType
top-k join processing,data analysis,Big Data,early termination mechanism,load balancing mechanism,MapReduce processing	Load management,Data mining,Histogram,Joins,Algorithm design,Computer science,Load balancing (computing),Big data,Scalability,Distributed computing,Salient	Conference
ISBN	Citations	PageRank
978-1-4673-9006-4	1	0.35
References	Authors
10	4

Authors (4 rows)

Cited by (1 rows)

References (10 rows)

Name	Order	Citations	PageRank
Mei Saouk	1	1	0.69
Christos Doulkeridis	2	899	55.91
Akrivi Vlachou	3	751	39.95
Kjetil Noervaag	4	1	0.35

1