Title
Application And Storage-Aware Data Placement And Job Scheduling For Hadoop Clusters
Abstract
As one of the most popular frameworks for large-scale analytics processing, Hadoop is facing two challenges: both applications and storage devices become heterogeneous. However, existing data placement and job scheduling schemes pay little attention to such heterogeneity of either application I/O requirements or I/O device capability, thus can greatly degrade system efficiencies. In this paper, we propose ASPS, an Application and Storage-aware data Placement and job Scheduling approach for Hadoop clusters. The idea is to place application data and schedule application tasks considering both application I/O requirements and storage device characteristics. Specifically, ASPS first introduces novel metrics to quantify I/O requirements of applications. Then, based on the quantification, ASPS places data of different applications to the preferred storage devices. Finally, ASPS tries to launch jobs with high I/O requirements on the nodes with the same type of faster devices to improve system efficiency. We have implemented ASPS in Hadoop framework. Experimental results show that ASPS can reduce the completion time of a single application by up to 36% and the average completion time of six concurrent applications by 27%, compared to existing data placement policies and job scheduling approaches.
Year
DOI
Venue
2020
10.1142/S0218126620502540
JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS
Keywords
DocType
Volume
Hadoop, MapReduce, HDFS, data placement, job scheduling, SSDs
Journal
29
Issue
ISSN
Citations 
16
0218-1266
0
PageRank 
References 
Authors
0.34
0
6
Name
Order
Citations
PageRank
Tao Li13210.42
Shuibing He210920.45
Ping Chen300.34
Siling Yang400.34
Yanlong Yin500.34
Cheng Xu6104.07