Adaptive Winograd's matrix multiplications - Citegraph

Paper Info

Title
Adaptive Winograd's matrix multiplications

Abstract
Modern architectures have complex memory hierarchies and increasing parallelism (e.g., multicores). These features make achieving and maintaining good performance across rapidly changing architectures increasingly difficult. Performance has become a complex tradeoff, not just a simple matter of counting cost of simple CPU operations. We present a novel, hybrid, and adaptive recursive Strassen-Winograd's matrix multiplication (MM) that uses automatically tuned linear algebra software (ATLAS) or GotoBLAS. Our algorithm applies to any size and shape matrices stored in either row or column major layout (in double precision in this work) and thus is efficiently applicable to both C and FORTRAN implementations. In addition, our algorithm divides the computation into equivalent in-complexity sub-MMs and does not require any extra computation to combine the intermediary sub-MM results. We achieve up to 22&percnt; execution-time reduction versus GotoBLAS/ATLAS alone for a single core system and up to 19&percnt; for a two dual-core processor system. Most importantly, even for small matrices such as 1500 × 1500, our approach attains already 10&percnt; execution-time reduction and, for MM of matrices larger than 3000× 3000, it delivers performance that would correspond, for a classic O(n3) algorithm, to faster-than-processor peak performance (i.e., our algorithm delivers the equivalent of 5 GFLOPS performance on a system with 4.4 GFLOPS peak performance and where GotoBLAS achieves only 4 GFLOPS). This is a result of the savings in operations (and thus FLOPS). Therefore, our algorithm is faster than any classic MM algorithms could ever be for matrices of this size. Furthermore, we present experimental evidence based on established methodologies found in the literature that our algorithm is, for a family of matrices, as accurate as the classic algorithms.

Year	DOI	Venue
2009	10.1145/1486525.1486528	ACM Trans. Math. Softw.
Keywords	Field	DocType
additional key words and phrases: winograd's matrix multiplications,classic algorithm,fast algorithms,adaptive winograd,execution-time reduction,dual-core processor system,good performance,classic mm algorithm,gflops peak performance,matrix multiplication,single core system,gflops performance,classic o,peak performance,linear algebra	Linear algebra,Single-core,Mathematical optimization,Matrix (mathematics),FLOPS,Computer science,Parallel computing,Double-precision floating-point format,Fortran,Theoretical computer science,Matrix multiplication,Computation	Journal
Volume	Issue	ISSN
36	1	0098-3500
Citations	PageRank	References
12	0.86	35
Authors
2

Authors (2 rows)

Cited by (12 rows)

References (35 rows)

Name	Order	Citations	PageRank
Paolo D'Alberto	1	136	11.24
Alexandru Nicolau	2	2265	307.74

1