Algorithm 1026: Concurrent Alternating Least Squares for Multiple Simultaneous Canonical Polyadic Decompositions | 0 | 0.34 | 2022 |
The Linear Algebra Mapping Problem. Current State of Linear Algebra Languages and Libraries | 0 | 0.34 | 2022 |
Rational Spectral Filters With Optimal Convergence Rate | 0 | 0.34 | 2021 |
Linnea: Automatic Generation of Efficient Linear Algebra Programs | 0 | 0.34 | 2021 |
Automatic Generation of Efficient Linear Algebra Programs | 0 | 0.34 | 2020 |
Spin Summations: A High-Performance Perspective. | 0 | 0.34 | 2019 |
A Timer-Augmented Cost Function for Load Balanced DSMC. | 0 | 0.34 | 2019 |
Accelerating Airebo: Navigating The Journey From Legacy To High-Performance Code | 1 | 0.36 | 2019 |
Program Generation for Linear Algebra Using Multiple Layers of DSLs. | 0 | 0.34 | 2019 |
The generalized matrix chain algorithm. | 0 | 0.34 | 2018 |
Design of a high-performance GEMM-like Tensor-Tensor Multiplication. | 14 | 0.61 | 2018 |
Optimizing AIREBO: Navigating the Journey from Complex Legacy Code to High Performance. | 0 | 0.34 | 2018 |
Accelerating molecular dynamics codes by performance and accuracy modeling. | 0 | 0.34 | 2018 |
Program generation for small-scale linear algebra applications. | 3 | 0.40 | 2018 |
MatchPy: A Pattern Matching Library. | 2 | 0.46 | 2017 |
Assessment of sound spatialisation algorithms for sonic rendering with headphones | 0 | 0.34 | 2017 |
Efficient Pattern Matching in Python. | 1 | 0.43 | 2017 |
TTC: A high-performance Compiler for Tensor Transpositions. | 5 | 0.42 | 2017 |
High-performance generation of the Hamiltonian and Overlap matrices in FLAPW methods. | 2 | 0.45 | 2017 |
Algorithm 979: Recursive Algorithms for Dense Linear Algebra - The ReLAPACK Collection. | 0 | 0.34 | 2017 |
Linnea: Compiling Linear Algebra Expressions to High-Performance Code | 0 | 0.34 | 2017 |
The Tersoff many-body potential: Sustainable performance through vectorization. | 0 | 0.34 | 2017 |
HPTT: A High-Performance Tensor Transposition C++ Library. | 9 | 0.52 | 2017 |
TTC: A Tensor Transposition Compiler for Multiple Architectures. | 7 | 0.50 | 2016 |
Recursive Algorithms for Dense Linear Algebra: The ReLAPACK Collection. | 1 | 0.35 | 2016 |
Large Scale Parallel Computations in R through Elemental. | 0 | 0.34 | 2016 |
A Note on Time Measurements in LAMMPS. | 0 | 0.34 | 2016 |
The Vectorization of the Tersoff Multi-Body Potential: An Exercise in Performance Portability. | 13 | 1.91 | 2016 |
Accelerating scientific codes by performance and accuracy modeling. | 0 | 0.34 | 2016 |
The Matrix Chain Algorithm to Compile Linear Algebra Expressions. | 0 | 0.34 | 2016 |
Large-scale linear regression: Development of high-performance routines. | 0 | 0.34 | 2016 |
Parallel computing on graphics processing units and heterogeneous platforms | 0 | 0.34 | 2015 |
The ELAPS framework: Experimental Linear Algebra Performance Studies | 2 | 0.38 | 2015 |
A Scalable, Linear-Time Dynamic Cutoff Algorithm For Molecular Dynamics | 0 | 0.34 | 2015 |
High performance solutions for big-data GWAS | 1 | 0.36 | 2014 |
On The Performance Prediction Of Blas-Based Tensor Contractions | 6 | 0.47 | 2014 |
Towards an Efficient Use of the BLAS Library for Multilinear Tensor Contractions. | 14 | 0.63 | 2014 |
GWAS on GPUs: streaming data from HDD for sustained performance | 0 | 0.34 | 2013 |
Dissecting the FEAST algorithm for generalized eigenproblems | 12 | 0.92 | 2013 |
Deriving dense linear algebra libraries. | 4 | 0.42 | 2013 |
Algorithms for large-scale whole genome association analysis | 2 | 0.41 | 2013 |
Streaming Data from HDD to GPUs for Sustained Peak Performance | 0 | 0.34 | 2013 |
Improved Accuracy and Parallelism for MRRR-based Eigensolvers -- A Mixed Precision Approach | 1 | 0.34 | 2013 |
Application-tailored linear algebra algorithms: A search-based approach | 4 | 0.41 | 2013 |
High-Performance Solvers for Dense Hermitian Eigenproblems | 9 | 0.60 | 2013 |
Towards Predictability of Operating System Supported Communication for PCIe Based Clusters. | 0 | 0.34 | 2013 |
Correlations in sequences of generalized eigenproblems arising in Density Functional Theory. | 6 | 0.60 | 2012 |
Performance Modeling for Dense Linear Algebra | 13 | 0.66 | 2012 |
Computing Petaflops over Terabytes of Data: The Case of Genome-Wide Association Studies | 7 | 0.66 | 2012 |
Solving sequences of generalized least-squares problems on multi-threaded architectures. | 7 | 0.68 | 2012 |