Title
Reliable Generation of High-Performance Matrix Algebra
Abstract
Scientific programmers often turn to vendor-tuned Basic Linear Algebra Subprograms (BLAS) to obtain portable high performance. However, many numerical algorithms require several BLAS calls in sequence, and those successive calls result in suboptimal performance. The entire sequence needs to be optimized in concert. Instead of vendor-tuned BLAS, a programmer could start with source code in Fortran or C (e.g., based on the Netlib BLAS) and use a state-of-the-art optimizing compiler. However, our experiments show that optimizing compilers often attain only one-quarter the performance of hand-optimized code. In this paper we present a domain-specific compiler for matrix algebra, the Build to Order BLAS (BTO), that reliably achieves high performance using a scalable search algorithm for choosing the best combination of loop fusion, array contraction, and multithreading for data parallelism. The BTO compiler generates code that is between 16% slower and 39% faster than hand-optimized code.
Year
DOI
Venue
2012
10.1145/2629698
ACM Transactions on Mathematical Software
Keywords
Field
DocType
domain specific languages,linear algebra,genetic algorithms
Loop fusion,Source code,Netlib,Computer science,Parallel computing,Fortran,Optimizing compiler,Theoretical computer science,Compiler,Data parallelism,Basic Linear Algebra Subprograms
Journal
Volume
Issue
ISSN
abs/1205.1098
3
0098-3500
Citations 
PageRank 
References 
0
0.34
19
Authors
5
Name
Order
Citations
PageRank
Geoffrey Belter191.54
Elizabeth R. Jessup237049.02
Jeremy G. Siek356345.96
Jeremy G. Siek456345.96
Boyana Norris541739.46