Title
Fault-Tolerant High-Performance Matrix Multiplication: Theory and Practice
Abstract
Abstract: In this paper, we extend the theory and practice regarding algorithmic fault-tolerant matrix-matrix multiplication, C = AB, in a number of ways. First, we propose low-overhead methods for detecting errors introduced not only in C but also in A and/or B. Second, we show that, theoretically, these methods will detect all errors as long as only one entry is corrupted. Third, we propose a low-overhead roll-back approach to correct errors once detected. Finally, we give a high-performance implementation of matrix-matrix multiplication that incorporates these error detection and correction methods. Empirical results demonstrate that these methods work well in practice while imposing an acceptable level of overhead relative to high-performance implementations without fault-tolerance.
Year
DOI
Venue
2001
10.1109/DSN.2001.941390
DSN
Keywords
Field
DocType
empirical result,error detection,matrix-matrix multiplication,algorithmic fault-tolerant matrix-matrix multiplication,acceptable level,b. second,high-performance implementation,fault-tolerant high-performance matrix multiplication,low-overhead method,correction method,low-overhead roll-back approach,error detection and correction,fault tolerance,fault tolerant,linear algebra,fault detection,algorithms,multiplication,space technology,high performance computing,error correction,matrix multiplication,propulsion
Fault detection and isolation,Computer science,Algorithm,Error detection and correction,Implementation,Multiplication,Fault tolerance,Matrix multiplication
Conference
ISBN
Citations 
PageRank 
0-7695-1101-5
25
1.69
References 
Authors
9
4
Name
Order
Citations
PageRank
John A. Gunnels171783.20
Robert A. van de Geijn22047203.08
Daniel S. Katz31496121.04
Enrique S. Quintana-Ortí41317150.59