Title
A fault-tolerant strategy for virtualized HPC clusters
Abstract
Virtualization is a common strategy for improving the utilization of existing computing resources, particularly within data centers. However, its use for high performance computing (HPC) applications is currently limited despite its potential for both improving resource utilization as well as providing resource guarantees to its users. In this article, we systematically evaluate three major virtual machine implementations for computationally intensive HPC applications using various standard benchmarks. Using VMWare Server, Xen, and OpenVZ, we examine the suitability of full virtualization (VMWare), paravirtualization (Xen), and operating system-level virtualization (OpenVZ) in terms of network utilization, SMP performance, file system performance, and MPI scalability. We show that the operating system-level virtualization provided by OpenVZ provides the best overall performance, particularly for MPI scalability. With the knowledge gained by our VM evaluation, we extend OpenVZ to include support for checkpointing and fault-tolerance for MPI-based virtual server distributed computing.
Year
DOI
Venue
2009
10.1007/s11227-008-0259-0
The Journal of Supercomputing
Keywords
Field
DocType
Virtualization,Benchmark,Fault-tolerance,Checkpointing,MPI
Virtualization,File server,Virtual machine,Hardware virtualization,Supercomputer,Computer science,Parallel computing,Full virtualization,Paravirtualization,Operating system,Distributed computing,Scalability
Journal
Volume
Issue
ISSN
50
3
0920-8542
Citations 
PageRank 
References 
11
1.00
31
Authors
2
Name
Order
Citations
PageRank
John Paul Walters126720.45
Vipin Chaudhary283883.24