Preparation and optimization of a diverse workload for a large-scale heterogeneous system | 0 | 0.34 | 2019 |
Massively Parallel First-Principles Simulation of Electron Dynamics in Materials. | 3 | 0.50 | 2017 |
The BLIS Framework: Experiments in Portability. | 25 | 1.04 | 2016 |
An Early Performance Study of Large-Scale POWER8 SMP Systems | 2 | 0.36 | 2016 |
Optimizing Sparse Linear Algebra for Large-Scale Graph Analytics. | 5 | 0.56 | 2015 |
Massively parallel models of the human circulatory system | 8 | 0.56 | 2015 |
Scalable Community Detection with the Louvain Algorithm | 15 | 0.78 | 2015 |
Active Memory Cube: A processing-in-memory architecture for exascale systems | 21 | 0.74 | 2015 |
Parallel deep neural network training for big data on blue gene/Q | 11 | 0.67 | 2014 |
Deriving dense linear algebra libraries. | 4 | 0.42 | 2013 |
Optimizing the performance of streaming numerical kernels on the IBM Blue Gene/P PowerPC 450 processor | 1 | 0.35 | 2013 |
Toward real-time modeling of human heart ventricles at cellular resolution: simulation of drug-induced arrhythmias | 2 | 0.38 | 2012 |
MPI Collective Communications on The Blue Gene/P Supercomputer: Algorithms and Optimizations | 28 | 1.50 | 2009 |
Programming the Linpack benchmark for Roadrunner | 5 | 0.82 | 2009 |
Fine-grained parallelization of the Car-Parrinello ab initio molecular dynamics method on the IBM Blue Gene/L supercomputer | 10 | 0.80 | 2008 |
Optimization of BLAS on the cell processor | 4 | 0.45 | 2008 |
BlueGene/L applications: Parallelism On a Massive Scale. | 3 | 0.62 | 2008 |
Optimization of fast Fourier transforms on the Blue Gene/L supercomputer | 1 | 0.39 | 2008 |
Is cache-oblivious DGEMM viable? | 4 | 0.46 | 2006 |
Minimal data copy for dense linear algebra factorization | 17 | 1.34 | 2006 |
Gordon Bell finalists I - Large scale drop impact analysis of mobile phone using ADVC on Blue Gene/L | 0 | 0.34 | 2006 |
Gordon Bell finalists I - Large-scale electronic structure calculations of high-Z metals on the BlueGene/L platform | 3 | 0.44 | 2006 |
Blue Gene/L performance tools | 6 | 1.16 | 2005 |
The science of deriving dense linear algebra algorithms | 69 | 8.38 | 2005 |
Large-Scale First-Principles Molecular Dynamics simulations on the BlueGene/L Platform using the Qbox code | 24 | 2.51 | 2005 |
A fully portable high performance minimal storage hybrid format Cholesky algorithm | 20 | 1.80 | 2005 |
Early experience with scientific applications on the blue gene/l supercomputer | 6 | 0.69 | 2005 |
A new array format for symmetric and triangular matrices | 6 | 0.67 | 2004 |
Architecture and Performance of the BlueGene/L Message Layer | 4 | 1.29 | 2004 |
Unlocking the Performance of the BlueGene/L Supercomputer | 24 | 3.22 | 2004 |
Rapid development of high-performance linear algebra libraries | 1 | 0.37 | 2004 |
A High-Performance SIMD Floating Point Unit for BlueGene/L: Architecture, Compilation, and Algorithm Design | 14 | 3.40 | 2004 |
An overview of the BlueGene/L Supercomputer | 166 | 22.33 | 2002 |
A Recursive Formulation of the Inversion of Symmetric Positive Definite Matrices in Packed Storage Data Format | 6 | 0.72 | 2002 |
A Family of High-Performance Matrix Multiplication Algorithms | 0 | 0.34 | 2001 |
Fault-Tolerant High-Performance Matrix Multiplication: Theory and Practice | 25 | 1.69 | 2001 |
FLAME: Formal Linear Algebra Methods Environment | 114 | 12.37 | 2001 |
Formal Methods for High-Performance Linear Algebra Libraries | 8 | 2.04 | 2000 |
A Flexible Class of Parallel Matrix Multiplication Algorithms | 28 | 2.77 | 1998 |
PLAPACK: High Performance through High-Level Abstraction | 9 | 1.29 | 1998 |
PLAPACK: Parallel Linear Algebra Package | 15 | 2.29 | 1997 |