Abstract | ||
---|---|---|
The emergence of high-density reconfigurable hardware devices gives scientists and engineers an option to accelerating their numerical computing applications on low-cost but powerful "FPGA-enhanced computers". In this paper, we introduced our efforts towards improving the computational performance of Basic Linear Algebra Subprograms (BLAS) by FPGA-specific algorithms/methods. Our study focus on three BLAS subroutines: floating point summation, matrix-vector multiplication, and matrix-matrix multiplication. They represent all three levels of BLAS functionalities, and their sustained computational performances are either memory bandwidth bounded or computation bounded. By proposing the group-alignment based floating-point summation method and applying this technique to other subroutines, we significantly improved their sustained computational performance and reduced numerical errors with moderate FPGA resources consumed. Comparing with existing FPGA-based implementations, our designs are efficient and compact with improved numerical accuracy and stability. |
Year | Venue | Keywords |
---|---|---|
2007 | ERSA | reconfigurable hardware,basic linear algebra subprograms,matrix multiplication,memory bandwidth,floating point |
Field | DocType | Citations |
Memory bandwidth,Subroutine,Computer science,Floating point,Parallel computing,Field-programmable gate array,Computational science,Multiplication,Computation,Reconfigurable computing,Basic Linear Algebra Subprograms | Conference | 2 |
PageRank | References | Authors |
0.41 | 10 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Chuan He | 1 | 55 | 6.23 |
Guan Qin | 2 | 96 | 12.51 |
Richard E. Ewing | 3 | 252 | 45.87 |
Wei Zhao | 4 | 3532 | 404.01 |