Title
A framework for flexible and scalable replica-exchange on production distributed CI
Abstract
Replica exchange represents a powerful class of algorithms used for enhanced configurational and energetic sampling in a range of physical systems. Computationally it represents a type of application with multiple scales of communication. At a fine-grained level there is often communication with a replica, typically an MPI process. At a coarse-grained level, the replicas communicate with other replicas -- both temporally as well as in amount of data exchanged. This paper outlines a novel framework developed to support the flexible execution of large-scale replica exchange. The framework is flexible in the sense that it supports different coupling schemes between replicas and is agnostic to the specific underlying simulation -- classical or quantum, serial or parallel simulation. The scalability of the framework is assessed using standard simulation benchmarks. In spite of the increasing communication and coordination requirements as a function of the number of replicas, our framework supports the execution of hundreds replicas without significant overhead. Although there are several specific aspects that will benefit from further optimization, a first working prototype has the ability to fundamentally change the scale of replica exchange simulations possible on production distributed cyberinfrastructure such as XSEDE, as well as support novel usage modes. This paper also represents the release of the framework to the broader biophysical simulation community and provides details on its usage.
Year
DOI
Venue
2013
10.1145/2484762.2484830
XSEDE
Keywords
Field
DocType
increasing communication,hundreds replica,broader biophysical simulation community,large-scale replica exchange,replica exchange simulation,scalable replica-exchange,standard simulation benchmarks,specific underlying simulation,parallel simulation,replica exchange,novel framework,impact,distributed computing,hpc,technology
Replica,Parallel simulation,Coupling,Computer science,Physical system,Parallel computing,Cyberinfrastructure,Sampling (statistics),Distributed computing,Scalability
Conference
Citations 
PageRank 
References 
2
0.41
3
Authors
11
Name
Order
Citations
PageRank
Brian K. Radak120.41
Melissa Romanus2426.08
Emilio Gallicchio319324.66
Tai-sung Lee426327.94
Ole Weidner5587.66
Nan-Jie Deng6372.32
Peng He7242.31
Wei Dai820.41
Darrin M. York9297.38
Ronald M Levy1027132.01
Shantenu Jha1118832.40