Application And Storage-Aware Data Placement And Job Scheduling For Hadoop Clusters - Citegraph

Paper Info

Title
Application And Storage-Aware Data Placement And Job Scheduling For Hadoop Clusters

Abstract
As one of the most popular frameworks for large-scale analytics processing, Hadoop is facing two challenges: both applications and storage devices become heterogeneous. However, existing data placement and job scheduling schemes pay little attention to such heterogeneity of either application I/O requirements or I/O device capability, thus can greatly degrade system efficiencies. In this paper, we propose ASPS, an Application and Storage-aware data Placement and job Scheduling approach for Hadoop clusters. The idea is to place application data and schedule application tasks considering both application I/O requirements and storage device characteristics. Specifically, ASPS first introduces novel metrics to quantify I/O requirements of applications. Then, based on the quantification, ASPS places data of different applications to the preferred storage devices. Finally, ASPS tries to launch jobs with high I/O requirements on the nodes with the same type of faster devices to improve system efficiency. We have implemented ASPS in Hadoop framework. Experimental results show that ASPS can reduce the completion time of a single application by up to 36% and the average completion time of six concurrent applications by 27%, compared to existing data placement policies and job scheduling approaches.

Year	DOI	Venue
2020	10.1142/S0218126620502540	JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS
Keywords	DocType	Volume
Hadoop, MapReduce, HDFS, data placement, job scheduling, SSDs	Journal	29
Issue	ISSN	Citations
16	0218-1266	0
PageRank	References	Authors
0.34	0	6

Authors (6 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Tao Li	1	32	10.42
Shuibing He	2	109	20.45
Ping Chen	3	0	0.34
Siling Yang	4	0	0.34
Yanlong Yin	5	0	0.34
Cheng Xu	6	10	4.07

1