Low synchronization GMRES algorithms. | 0 | 0.34 | 2018 |
Bidiagonalization and R-Bidiagonalization: Parallel Tiled Algorithms, Critical Paths and Distributed-Memory Implementation | 1 | 0.35 | 2017 |
Bidiagonalization with Parallel Tiled Algorithms. | 0 | 0.34 | 2016 |
A Makespan Lower Bound for the Scheduling of the Tiled Cholesky Factorization based on ALAP scheduling | 0 | 0.34 | 2015 |
A Backward/Forward Recovery Approach for the Preconditioned Conjugate Gradient Method | 3 | 0.45 | 2015 |
Mixing LU and QR factorization algorithms to design high-performance dense linear algebra solvers | 1 | 0.36 | 2015 |
Level-3 Cholesky Factorization Routines Improve Performance of Many Cholesky Algorithms | 2 | 0.45 | 2013 |
A Greedy Algorithm for Optimally Pipelining a Reduction. | 0 | 0.34 | 2013 |
Hierarchical QR factorization algorithms for multi-core clusters | 19 | 0.83 | 2013 |
Topic 10: Parallel Numerical Algorithms - (Introduction). | 0 | 0.34 | 2013 |
Communication-optimal Parallel and Sequential QR and LU Factorizations | 105 | 5.33 | 2012 |
Hierarchical QR Factorization Algorithms for Multi-core Cluster Systems | 7 | 0.53 | 2012 |
Flexible Variants of Block Restarted GMRES Methods with Application to Geophysics | 14 | 0.60 | 2012 |
Poster: Matrices over Runtime Systems at Exascale | 4 | 0.43 | 2012 |
Abstract: Matrices Over Runtime Systems at Exascale | 0 | 0.34 | 2012 |
Any admissible cycle-convergence behavior is possible for restarted GMRES at its initial cycles. | 5 | 0.56 | 2011 |
LU factorization for accelerator-based systems | 31 | 1.61 | 2011 |
Tiled QR factorization algorithms | 14 | 0.76 | 2011 |
Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA | 40 | 1.64 | 2011 |
QCG-OMPI: MPI applications on grids | 6 | 0.59 | 2011 |
A Critical Path Approach to Analyzing Parallelism of Algorithmic Variants. Application to Cholesky Inversion | 4 | 0.43 | 2010 |
Towards an efficient tile matrix inversion of symmetric positive definite matrices on multicore architectures | 11 | 0.72 | 2010 |
Rectangular full packed format for cholesky's algorithm: factorization, solution, and inversion | 7 | 0.60 | 2010 |
Accelerating scientific computations with mixed precision algorithms | 47 | 3.69 | 2009 |
QR factorization of tall and skinny matrices in a grid computing environment | 26 | 1.26 | 2009 |
QR factorization of tall and skinny matrices in a grid computing environment | 26 | 1.26 | 2009 |
Computing the conditioning of the components of a linear least-squares solution | 9 | 0.68 | 2009 |
A class of parallel tiled linear algebra algorithms for multicore architectures | 218 | 13.66 | 2009 |
The Problem With the Linpack Benchmark 1.0 Matrix Generator | 2 | 0.37 | 2009 |
Communication-avoiding parallel and sequential QR factorizations | 18 | 1.62 | 2008 |
Algorithm-based fault tolerance applied to high performance computing | 89 | 2.29 | 2008 |
Advanced MPI Programming | 0 | 0.34 | 2007 |
Mixed Precision Iterative Refinement Techniques for the Solution of Dense Linear Systems | 42 | 5.01 | 2007 |
A Distributed Packed Storage For Large Dense Parallel In-Core Calculations | 0 | 0.34 | 2007 |
Recovery Patterns for Iterative Methods in a Parallel Unstable Environment | 34 | 1.44 | 2007 |
Performance Optimization and Modeling of Blocked Sparse Kernels | 21 | 1.61 | 2007 |
Parallel tiled QR factorization for multicore architectures | 31 | 2.27 | 2007 |
A note on the error analysis of classical Gram–Schmidt | 5 | 0.61 | 2006 |
Recent advances in dense linear algebra: minisymposium abstract | 0 | 0.34 | 2006 |
The impact of multicore on math software | 48 | 3.83 | 2006 |
Conjugate-gradient eigenvalue solvers in computing electronic properties of nanostructure architectures | 7 | 0.72 | 2006 |
Prospectus for the next LAPACK and ScaLAPACK libraries | 7 | 1.80 | 2006 |
Tools and techniques for performance - Exploiting the performance of 32 bit floating point arithmetic in obtaining 64 bit accuracy (revisiting iterative refinement for linear systems) | 3 | 0.87 | 2006 |
Exploiting Mixed Precision Floating Point Hardware in Scientific Computations. | 6 | 0.56 | 2006 |
Hash functions for datatype signatures in MPI | 2 | 0.44 | 2005 |
Fault tolerant high performance computing by a coding approach | 60 | 3.20 | 2005 |
Algorithm 842: A set of GMRES routines for real and complex arithmetics on high performance computers | 20 | 2.11 | 2005 |
Rounding error analysis of the classical Gram-Schmidt orthogonalization process | 28 | 2.05 | 2005 |
Comparison of nonlinear conjugate-gradient methods for computing the electronic properties of nanostructure architectures | 1 | 0.38 | 2005 |
A Rank-k Update Procedure for Reorthogonalizing the Orthogonal Factor from Modified Gram-Schmidt. | 1 | 0.43 | 2004 |