Title
An Abstract Interface for System Software on Large-Scale Clusters
Abstract
Scalable management of distributed resources is one of the major challenges when building large-scale clusters for high-performance computing. This task includes transparent fault tolerance, efficient deployment of resources and support for all the needs of parallel applications: parallel I/O, deterministic behavior and responsiveness. These challenges may seem daunting with commodity hardware and operating systems, since they were not designed to support a global, single management view of a large-scale system. In this paper we propose and demonstrate an abstract network interface in the cluster interconnect to facilitate the implementation of a simple yet powerful global operating system. This system, which can be thought of as a coarse-grain SIMD operating system, can allow commodity clusters to grow to thousands of nodes, while still retaining the usability and performance of the single-node workstation.
Year
DOI
Venue
2006
10.1093/comjnl/bxl020
Comput. J.
Keywords
DocType
Volume
commodity hardware,parallel application,abstract network interface,cluster computing,cluster operating system,powerful global operating system,network hardware,single management view,scalable management,commodity cluster,coarse-grain simd operating system,large-scale clusters,abstract interface,resource management,fault tolerance,system software,large-scale system,large-scale cluster,operating system,network interface,resource manager,fault tolerant
Journal
49
Issue
ISSN
Citations 
4
0010-4620
1
PageRank 
References 
Authors
0.36
26
4
Name
Order
Citations
PageRank
Juan Fernandez126923.17
Eitan Frachtenberg2106085.08
Fabrizio Petrini32050165.82
José-Carlos Sancho41106.88