Title
HOG: Distributed Hadoop MapReduce on the Grid
Abstract
MapReduce is a powerful data processing platform for commercial and academic applications. In this paper, we build a novel Hadoop MapReduce framework executed on the Open Science Grid which spans multiple institutions across the United States - Hadoop On the Grid (HOG). It is different from previous MapReduce platforms that run on dedicated environments like clusters or clouds. HOG provides a free, elastic, and dynamic MapReduce environment on the opportunistic resources of the grid. In HOG, we improve Hadoop's fault tolerance for wide area data analysis by mapping data centers across the U.S. to virtual racks and creating multi-institution failure domains. Our modifications to the Hadoop framework are transparent to existing Hadoop MapReduce applications. In the evaluation, we successfully extend HOG to 1100 nodes on the grid. Additionally, we evaluate HOG with a simulated Facebook Hadoop MapReduce workload. We conclude that HOG's rapid scalability can provide comparable performance to a dedicated Hadoop cluster.
Year
DOI
Venue
2012
10.1109/SC.Companion.2012.154
SC Companion
Keywords
Field
DocType
hadoop-on-the-grid,wide area data analysis,mapreduce,hadoop mapreduce application,hadoop fault tolerance,united states,parallel programming,distributed hadoop mapreduce framework,grid computing,powerful data,middleware,fault tolerant computing,data analysis,facebook,hadoop framework,mapping data center,dynamic mapreduce environment,hog,simulated facebook hadoop mapreduce,previous mapreduce platform,novel hadoop mapreduce framework,dedicated hadoop cluster
Middleware,Data processing,Grid computing,Data-intensive computing,Data mapping,Computer science,Parallel computing,Fault tolerance,Operating system,Grid,Distributed computing,Scalability
Conference
ISBN
Citations 
PageRank 
978-1-4673-6218-4
9
0.61
References 
Authors
11
4
Name
Order
Citations
PageRank
Chen He1714101.22
Derek Weitzel2113.03
David Swanson3483.17
Ying Lu4654.01