Abstract | ||
---|---|---|
AbstractWe propose a reproducible variant of the unblocked LU factorization for graphics processor units (GPUs). For this purpose, we build upon Level-1/2 BLAS kernels that deliver correctly-rounded and reproducible results for the dot (inner) product, vector scaling, and the matrix-vector product. In addition, we draw a strategy to enhance the accuracy of the triangular solve via iterative refinement. Following a bottom-up approach, we finally construct a reproducible unblocked implementation of the LU factorization for GPUs, which accommodates partial pivoting for stability and can be eventually integrated in a high performance and stable algorithm for the (blocked) LU factorization. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1177/1094342019832968 | Periodicals |
Keywords | Field | DocType |
LU factorization, BLAS, reproducibility, accuracy, long accumulator, error-free transformation, GPUs | Graphics,Iterative refinement,Computer science,Parallel computing,Pivot element,Scaling,LU decomposition | Journal |
Volume | Issue | ISSN |
33 | 5 | 1094-3420 |
Citations | PageRank | References |
0 | 0.34 | 15 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Roman Iakymchuk | 1 | 32 | 5.98 |
Stef Graillat | 2 | 92 | 16.06 |
David Defour | 3 | 131 | 18.28 |
Enrique S. Quintana-Ortí | 4 | 1317 | 150.59 |