Title
An Adaptive Framework for the Execution of Data-Intensive MapReduce Applications in the Cloud
Abstract
Cloud computing technologies play an increasingly important role in realizing data-intensive applications by offering a virtualized compute and storage infrastructure that can scale on demand. A programming model that has gained a lot of interest in this context is MapReduce, which simplifies processing of large-scale distributed data volumes, usually on top of a distributed file system layer. In this paper we report on a self-configuring adaptive framework for developing and optimizing data-intensive scientific applications on top of Cloud and Grid computing technologies and the Hadoop framework. Our framework relies on a MAPE-K loop, known from autonomic computing, for optimizing the configuration of data-intensive applications at three abstraction layers: the application layer, the MapReduce layer, and the resource layer. By evaluating monitored resources, the framework configures the layers and allocates the resources on a per job basis. The evaluation of configurations relies on historic data and a utility function that ranks different configurations regarding to the arising costs. The optimization framework has been integrated in the Vienna Grid Environment (VGE), a service-oriented application development environment for providing applications on HPC systems, clusters and Clouds as services. An experimental evaluation of our framework has been undertaken with a data-analysis application from the field of molecular systems biology.
Year
DOI
Venue
2011
10.1109/IPDPS.2011.254
IPDPS Workshops
Keywords
Field
DocType
application layer,hadoop framework,abstraction layer,data-intensive application,resource layer,optimization framework,data-intensive mapreduce applications,self-configuring adaptive framework,adaptive framework,file system layer,mapreduce layer,grid computing technology,cloud computing,distributed file system,programming model,autonomic computing,grid computing,application development,systems biology,xml,system biology,data analysis,distributed databases
Distributed File System,Autonomic computing,Application layer,Grid computing,Programming paradigm,Computer science,Distributed database,Grid,Distributed computing,Cloud computing
Conference
Citations 
PageRank 
References 
3
0.45
11
Authors
3
Name
Order
Citations
PageRank
Martin Koehler1568.05
Yuriy Kaniovskyi2142.78
Siegfried Benkner361467.47