Title
Poster: scalable infrastructure to support supercomputer resiliency-aware applications and load balancing
Abstract
High performance computing systems display increasing complexity and component counts. This trend exposes weaknesses in the underlying clustering infrastructure needed for continuous availability, maximizing utilization, and efficient administration of such systems. To mitigate the problem, we present a highly scalable clustering infrastructure, based on peer-to-peer technologies, for supporting resiliency-aware applications as well as efficient monitoring and load balancing. Supported services include Membership, Publish-subscribe messaging, Convergecast, Attribute replication and a DHT. We present a preliminary evaluation taken from an IBM BlueGene/P, demonstrating scalability up to ~ 256K nodes.
Year
DOI
Venue
2011
10.1145/2148600.2148606
SC Companion
Keywords
Field
DocType
scalable clustering infrastructure,load balancing,efficient administration,attribute replication,ibm bluegene,efficient monitoring,supercomputer resiliency-aware application,underlying clustering infrastructure,component count,supported service,scalable infrastructure,continuous availability,publish-subscribe messaging,middleware,clustering,load balance,publish subscribe,scalability
Middleware,Psychological resilience,IBM,Peer-to-peer,Supercomputer,Computer science,Load balancing (computing),Parallel computing,Computer network,Cluster analysis,Scalability,Distributed computing
Conference
Citations 
PageRank 
References 
1
0.35
6
Authors
4
Name
Order
Citations
PageRank
Yoav Tock129215.20
Benjamin Mandler2101.62
José E. Moreira32282230.26
Terry Jones4443.65