Minimizing the usage of hardware counters for collective communication using triggered operations. | 0 | 0.34 | 2020 |
Parallelizing Mpi Using Tasks For Hybrid Programming Models | 0 | 0.34 | 2018 |
Why is MPI so slow?: analyzing the fundamental limits in implementing MPI-3.1 | 2 | 0.39 | 2017 |
OpenMP $$^{\textregistered }$$ Runtime Instrumentation for Optimization. | 0 | 0.34 | 2017 |
Using Migratable Objects to Enhance Fault Tolerance Schemes in Supercomputers | 11 | 0.55 | 2015 |
Communication and topology-aware load balancing in Charm++ with TreeMatch. | 9 | 0.53 | 2013 |
A scalable double in-memory checkpoint and restart scheme towards exascale | 43 | 1.29 | 2012 |
Automated Load Balancing Invocation Based on Application Characteristics | 12 | 0.69 | 2012 |
A uGNI-based Asynchronous Message-driven Runtime System for Cray Supercomputers with Gemini Interconnect | 10 | 0.63 | 2012 |
Optimizing fine-grained communication in a biomolecular simulation application on Cray XK6 | 12 | 0.81 | 2012 |
Parssse: An Adaptive Parallel State Space Search Engine | 6 | 0.50 | 2011 |
Automatic Handling of Global Variables for Multi-threaded MPI Programs | 8 | 0.59 | 2011 |
An Adaptive Framework for Large-Scale State Space Search | 8 | 0.56 | 2011 |
Periodic hierarchical load balancing for large supercomputers | 29 | 1.12 | 2011 |
Enabling and scaling biomolecular simulations of 100 million atoms on petascale machines with a multicore-optimized message-driven runtime | 10 | 0.62 | 2011 |
Simulating Large Scale Parallel Applications Using Statistical Models for Sequential Execution Blocks | 10 | 0.85 | 2010 |
Debugging large scale applications in a virtualized environment | 0 | 0.34 | 2010 |
Optimizing a parallel runtime system for multicore clusters: a case study | 10 | 0.69 | 2010 |
Robust non-intrusive record-replay with processor extraction | 4 | 0.44 | 2010 |
Automatic MPI to AMPI program transformation using photran | 15 | 0.95 | 2010 |
Hierarchical Load Balancing for Charm++ Applications on Large Supercomputers | 20 | 0.92 | 2010 |
Overcoming scaling challenges in biomolecular simulations across multiple platforms | 46 | 3.58 | 2008 |
Performance evaluation of automatic checkpoint-based fault tolerance for AMPI and Charm++ | 14 | 0.84 | 2006 |
Scaling applications to massively parallel machines using Projections performance analysis tool | 24 | 1.80 | 2006 |
A system integration framework for coupled multiphysics simulations | 7 | 0.77 | 2006 |
Multiple Flows of Control in Migratable Parallel Programs | 11 | 0.89 | 2006 |
ParFUM: a parallel framework for unstructured meshes for scalable dynamic physics applications | 21 | 1.47 | 2006 |
Poster reception - Charm++ simplifies coding for the cell processor | 1 | 0.38 | 2006 |
Performance evaluation of adaptive MPI | 77 | 3.03 | 2006 |
Performance Prediction Using Simulation of Large-Scale Interconnection Networks in POSE | 15 | 1.11 | 2005 |
Simulation-based performance prediction for large parallel machines | 51 | 2.79 | 2005 |
Performance Modeling and Programming Environments for Petaflops Computers and the Blue Gene Machine | 8 | 1.41 | 2004 |
BigSim: A Parallel Simulator for Performance Prediction of Extremely Large Parallel Machines | 103 | 4.68 | 2004 |
FTC-Charm++: an in-memory checkpoint-based fault tolerant runtime for Charm++ and MPI | 90 | 3.33 | 2004 |
Scaling molecular dynamics to 3000 processors with projections: a performance analysis case study | 20 | 2.69 | 2003 |
NAMD: biomolecular simulation on thousands of processors | 114 | 10.80 | 2002 |
A parallel-object programming model for petaflops machines and blue gene/cyclops | 8 | 1.94 | 2002 |