Title
Hierarchical approach for deriving a reproducible unblocked LU factorization
Abstract
AbstractWe propose a reproducible variant of the unblocked LU factorization for graphics processor units (GPUs). For this purpose, we build upon Level-1/2 BLAS kernels that deliver correctly-rounded and reproducible results for the dot (inner) product, vector scaling, and the matrix-vector product. In addition, we draw a strategy to enhance the accuracy of the triangular solve via iterative refinement. Following a bottom-up approach, we finally construct a reproducible unblocked implementation of the LU factorization for GPUs, which accommodates partial pivoting for stability and can be eventually integrated in a high performance and stable algorithm for the (blocked) LU factorization.
Year
DOI
Venue
2019
10.1177/1094342019832968
Periodicals
Keywords
Field
DocType
LU factorization, BLAS, reproducibility, accuracy, long accumulator, error-free transformation, GPUs
Graphics,Iterative refinement,Computer science,Parallel computing,Pivot element,Scaling,LU decomposition
Journal
Volume
Issue
ISSN
33
5
1094-3420
Citations 
PageRank 
References 
0
0.34
15
Authors
4
Name
Order
Citations
PageRank
Roman Iakymchuk1325.98
Stef Graillat29216.06
David Defour313118.28
Enrique S. Quintana-Ortí41317150.59