Title
HPSO: Prefetching Based Scheduling to Improve Data Locality for MapReduce Clusters.
Abstract
Due to cluster resource competition and task scheduling policy, some map tasks are assigned to nodes without input data, which causes significant data access delay. Data locality is becoming one of the most critical factors to affect performance of MapReduce clusters. As machines in MapReduce clusters have large memory capacities, which are often underutilized, in-memory prefetching input data is an effective way to improve data locality. However, it is still posing serious challenges to cluster designers on what and when to prefetch. To effectively use prefetching, we have built HPSO (High Performance Scheduling Optimizer), a prefetching service based task scheduler to improve data locality for MapReduce jobs. The basic idea is to predict the most appropriate nodes to which future map tasks should be assigned and then preload the input data to memory without any delaying on launching new tasks. To this end, we have implemented HPSO in Hadoop-1.1.2. The experiment results have shown that the method can reduce the map tasks causing remote data delay, and improves the performance of Hadoop clusters.
Year
DOI
Venue
2014
10.1007/978-3-319-11194-0_7
ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2014, PT II
Keywords
Field
DocType
Data locality,MapReduce clusters,prefetching,task scheduler
Critical factors,Cluster (physics),Locality,Scheduling (computing),Computer science,Parallel computing,Data delay,Instruction prefetch,Data access,Distributed computing
Conference
Volume
ISSN
Citations 
8631
0302-9743
6
PageRank 
References 
Authors
0.46
16
5
Name
Order
Citations
PageRank
Mingming Sun1324.87
Hang Zhuang2266.54
Xuehai Zhou355177.54
Kun Lu4213.75
Changlong Li5266.88