Title
Hybrid cloud and cluster computing paradigms for life science applications.
Abstract
BACKGROUND: Clouds and MapReduce have shown themselves to be a broadly useful approach to scientific computing especially for parallel data intensive applications. However they have limited applicability to some areas such as data mining because MapReduce has poor performance on problems with an iterative structure present in the linear algebra that underlies much data analysis. Such problems can be run efficiently on clusters using MPI leading to a hybrid cloud and cluster environment. This motivates the design and implementation of an open source Iterative MapReduce system Twister. RESULTS: Comparisons of Amazon, Azure, and traditional Linux and Windows environments on common applications have shown encouraging performance and usability comparisons in several important non iterative cases. These are linked to MPI applications for final stages of the data analysis. Further we have released the open source Twister Iterative MapReduce and benchmarked it against basic MapReduce (Hadoop) and MPI in information retrieval and life sciences applications. CONCLUSIONS: The hybrid cloud (MapReduce) and cluster (MPI) approach offers an attractive production environment while Twister promises a uniform programming environment for many Life Sciences applications. METHODS: We used commercial clouds Amazon and Azure and the NSF resource FutureGrid to perform detailed comparisons and evaluations of different approaches to data intensive computing. Several applications were developed in MPI, MapReduce and Twister in these different environments.
Year
DOI
Venue
2010
10.1186/1471-2105-11-S12-S3
BMC Bioinformatics
Keywords
Field
DocType
linear algebra,computational biology,cluster analysis,scientific computing,bioinformatics,algorithms,data intensive computing,cluster computing,data mining,information retrieval,metagenomics,microarrays,data analysis
Linear algebra,Cluster (physics),Virtual machine,Cloud systems,Computer science,Software,Bioinformatics,Computer cluster,Cloud computing
Journal
Volume
Issue
ISSN
11
S-12
1471-2105
Citations 
PageRank 
References 
48
2.00
13
Authors
12
Name
Order
Citations
PageRank
Judy Qiu174343.25
Jaliya Ekanayake2104060.58
Thilina Gunarathne374438.87
Jong Youl Choi430926.54
Seung-Hee Bae557131.67
Hui Li645720.10
Bingjing Zhang752125.17
Tak-Lon Wu81999.15
Yang Ruan91126.26
Saliya Ekanayake10909.34
Adam Hughes11813.90
Geoffrey Fox124070575.38