Title
Evaluating storage systems for scientific data in the cloud
Abstract
Infrastructure-as-a-Service (IaaS) clouds are an appealing resource for scientific computing. However, the bare-bones presentation of raw Linux virtual machines leaves much to the application developer. For many cloud applications, effective data handling is critical to efficient application execution. This paper investigates the capabilities of a variety of POSIX-accessible distributed storage systems to manage data access patterns resulting from workflow application executions in the cloud. We leverage the expressivity of the Swift parallel scripting framework to benchmark the performance of a number of storage systems using synthetic workloads and three real-world applications. We characterize two representative commercial storage systems (Amazon S3 and HDFS, respectively) and two emerging research-based storage systems (Chirp/Parrot and MosaStore). We find the use of aggregated node-local resources effective and economical compared with remotely located S3 storage. Our experiments show that applications run at scale with MosaStore show up to 30\\% improvement in makespan time compared with those run with S3. We also find that storage-system driven application deployments in the cloud results in better runtime performance compared with an on-demand data-staging driven approach.
Year
DOI
Venue
2014
10.1145/2608029.2608034
ScienceCloud@HPDC
Keywords
Field
DocType
cloud,physical sciences and engineering,distributed computing,storage systems,object representation
Virtual machine,Converged storage,Computer science,Distributed data store,Real-time computing,Workflow application,Data access,Group method of data handling,Operating system,Cloud computing,Distributed computing,Scripting language
Conference
Citations 
PageRank 
References 
3
0.40
17
Authors
7
Name
Order
Citations
PageRank
Ketan Maheshwari116313.70
Justin M. Wozniak246435.32
Hao Yang3121.54
Daniel S. Katz41496121.04
Matei Ripeanu52461233.84
Victor Zavala630.40
Michael Wilde730.40