Improving load balancing for MapReduce-based entity matching. - Citegraph

Paper Info

Title
Improving load balancing for MapReduce-based entity matching.

Abstract
The effectiveness and scalability of MapReduce-based implementations for data-intensive tasks depends on the data assignment made from map to reduce tasks. The robustness of this assignment strategy is crucial to achieve skewed data handling and balanced workload distribution among all reduce tasks. For the entity matching problem in the Big Data context, we propose BlockSlicer, a MapReduce-based approach that supports blocking techniques to reduce the entity matching search space. The approach utilizes a preprocessing MapReduce job to analyze the data distribution and provides an improved load balancing by applying an efficient block slice strategy as well as a well-known optimization algorithm to assign the generated match tasks. We evaluate the approach against an existing one that addresses the same problem on a real cloud infrastructure. The results show that our approach increases significantly the performance of distributed entity matching task by reducing the amount of data generated from the map phase and diminishing the overall execution time.

Year	DOI	Venue
2013	10.1109/ISCC.2013.6755016	ISCC
Keywords	DocType	ISSN
parallel programming,indexes,big data,load balancing,cloud computing,pattern matching,resource allocation,data handling,optimization,programming	Conference	1530-1346
Citations	PageRank	References
8	0.52	7
Authors
2

Authors (2 rows)

Cited by (8 rows)

References (7 rows)

Name	Order	Citations	PageRank
Demetrio Gomes Mestre	1	36	5.44
Carlos Eduardo Santos Pires	2	57	10.68

1