An Uncoupled Data Process and Transfer Model for MapReduce. - Citegraph

Paper Info

Title
An Uncoupled Data Process and Transfer Model for MapReduce.

Abstract
In the original MapReduce model, reduce tasks need to fetch output data of map tasks in the manner of "pull". However, reduce tasks which are occupying reduce slots cannot start executing until all the corresponding map tasks are completed. It forms the dependence between map and reduce tasks, which is called the coupled relationship in this paper. The coupled relationship leads to two problems: reduce slot hoarding and underutilized network bandwidth. Meanwhile, storing the result data is costly especially when the system has replications, which leads to the inefficient storage problem. We propose an uncoupled data process and transfer model in order to address these problems. Four core techniques, including weighted mapping, data pushing, partial data backup, and data compression are introduced and applied in Apache Hadoop, the mainstream open-source implementation of MapReduce model. This work has been practiced in Baidu, the biggest search engine company in China. A real-world application for web data processing shows that our model can improve the system throughput by 29.5%, reduce the total wall time by 22.8%, provide a weighted wall time acceleration of 26.3%, and reduce the result data stored in disk by 70%. What's more, the implementation of this model is transparent to users and compatible with the original Hadoop.

Year	DOI	Venue
2015	10.1007/978-3-662-46335-2_2	Lecture Notes in Computer Science
Keywords	Field	DocType
MapReduce,Data transfer,Uncoupled model,Compression	Data processing,Search engine,Data transmission,Computer science,Parallel computing,Bandwidth (signal processing),Acceleration,Throughput,Data compression,Backup,Distributed computing	Journal
Volume	ISSN	Citations
8970	0302-9743	0
PageRank	References	Authors
0.34	13	4

Authors (4 rows)

Cited by (0 rows)

References (13 rows)

Name	Order	Citations	PageRank
Li Zha	1	0	0.34
Jie Zhang	2	47	15.01
Wei Liu	3	0	0.68
Jian Lin	4	34	8.22

1