Abstract | ||
---|---|---|
Although clusters are a popular form of high-performance computing, they remain more difficult to manage than sequential systems—or even symmetric multiprocessors. In this paper, we identify a small set of primitive mechanisms that are sufficiently general to be used as building blocks to solve a variety of resource-management problems. We then present STORM, a resource-management environment that embodies these mechanisms in a scalable, low-overhead, and efficient implementation. The key innovation behind STORM is a modular software architecture that reduces all resource management functionality to a small number of highly scalable mechanisms. These mechanisms simplify the integration of resource management with low-level network features. As a result of this design, STORM can launch large, parallel applications an order of magnitude faster than the best time reported in the literature and can gang-schedule a parallel application as fast as the node OS can schedule a sequential application. This paper describes the mechanisms and algorithms behind STORM and presents a detailed performance model that shows that STORM's performance can scale to thousands of nodes. |
Year | DOI | Venue |
---|---|---|
2006 | 10.1109/TC.2006.206 | Computers, IEEE Transactions |
Keywords | Field | DocType |
computer network management,network operating systems,parallel machines,processor scheduling,resource allocation,software architecture,workstation clusters,cluster computing,high-performance computing,large-scale parallel computers,modular software architecture,network operating system,node OS,parallel application gang-scheduling,performance model,scalable resource management environment,sequential application scheduling,sequential system management,symmetric multiprocessor management,Hardware/software interface,and modeling,integration,network operating systems,supercomputers.,system architectures | Resource management,Supercomputer,Computer science,Parallel computing,Network operating system,Real-time computing,Resource allocation,Software architecture,Modular design,Computer cluster,Scalability,Distributed computing | Journal |
Volume | Issue | ISSN |
55 | 12 | 0018-9340 |
Citations | PageRank | References |
5 | 0.56 | 26 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Eitan Frachtenberg | 1 | 1060 | 85.08 |
Fabrizio Petrini | 2 | 2050 | 165.82 |
Juan Fernandez | 3 | 269 | 23.17 |
Scott Pakin | 4 | 1098 | 134.55 |