Title
Overcoming Hadoop Scaling Limitations through Distributed Task Execution
Abstract
Data driven programming models like MapReduce have gained the popularity in large-scale data processing. Although great efforts through the Hadoop implementation and framework decoupling (e.g. YARN, Mesos) have allowed Hadoop to scale to tens of thousands of commodity cluster processors, the centralized designs of the resource manager, task scheduler and metadata management of HDFS file system adversely affect Hadoop's scalability to tomorrow's extreme-scale data centers. This paper aims to address the YARN scaling issues through a distributed task execution framework, MATRIX, which was originally designed to schedule the executions of data-intensive scientific applications of many-task computing on supercomputers. We propose to leverage the distributed design wisdoms of MATRIX to schedule arbitrary data processing applications in cloud. We compare MATRIX with YARN in processing typical Hadoop workloads, such as WordCount, TeraSort, Grep and RandomWriter, and the Ligand application in Bioinformatics on the Amazon Cloud. Experimental results show that MATRIX outperforms YARN by 1.27X for the typical workloads, and by 2.04X for the real application. We also run and simulate MATRIX with fine-grained sub-second workloads. With the simulation results giving the efficiency of 86.8% at 64K cores for the 150ms workload, we show that MATRIX has the potential to enable Hadoop to scale to extreme-scale data centers for fine-grained workloads.
Year
DOI
Venue
2015
10.1109/CLUSTER.2015.42
Cluster Computing
Keywords
Field
DocType
data driven programming model, MapReduce, task execution framework, scheduling, extreme scales
Metadata,File system,Yarn,Computer science,Scheduling (computing),Parallel computing,Server,Real-time computing,Metadata management,Distributed computing,Cloud computing,Scalability
Conference
ISSN
Citations 
PageRank 
1552-5244
24
0.65
References 
Authors
48
9
Name
Order
Citations
PageRank
Ke Wang131313.66
Ning Liu2240.98
Iman Sadooghi3654.43
Xi Yang4755.61
Xiaobing Zhou52288.02
Tonglin Li621510.65
Michael Lang726619.91
Xian-he Sun81987182.64
Raicu, Ioan92264129.28