Title
Towards energy awareness in Hadoop.
Abstract
With the rise in the use of data centers comprised of commodity clusters for data-intensive applications, the energy efficiency of these setups is becoming a paramount concern for data center operators. Moreover, applications developed for Hadoop framework, which has now become a de-facto implementation of the MapReduce framework, now comprise complex workflows that are managed by specialized workflow schedulers, such as Oozie. These schedulers assume cluster resources to be homogeneous and often consider data locality to be the only scheduling constraint. However, this is increasingly not the case in modern data centers. The addition of low-power computing devices and regular hardware upgrades have made heterogeneity the norm, in that clusters are now comprised of several logical sub-clusters each with its own performance and energy profile. In this paper we present oSched, a workflow scheduler that profiles the performance and the energy characteristics of applications on each hardware sub-cluster in a heterogeneous cluster in order to improve the application-resource match while ensuring energy efficiency and performance related Service Level Agreement (SLA) goals. oSched borrows from our earlier work, fSched, a hardware-aware scheduler, that improves the resource-application match to improve application performance. We evaluate oSched on three clusters with different hardware configurations and energy profiles, where each sub-cluster comprises of five homogeneous nodes. Our evaluation of oSched shows that application performance and power characteristics vary significantly across different hardware configurations. We show that the hardware-aware scheduling can perform 12.8% faster, while saving 21% more power than hardware oblivious scheduling for the studied applications.
Year
DOI
Venue
2014
10.1109/NDM.2014.6
NDM@SC
Keywords
Field
DocType
throughput
Cluster (physics),Locality,Computer science,Efficient energy use,Scheduling (computing),Service-level agreement,Computer network,Throughput,Data center,Workflow,Distributed computing
Conference
Citations 
PageRank 
References 
1
0.35
24
Authors
4
Name
Order
Citations
PageRank
K. R. Krish1654.17
M. Safdar Iqbal2934.76
M. Mustafa Rafique315715.49
Ali R. Butt465147.51