Title | ||
---|---|---|
MV-FT: Efficient Implementation for Matrix-Vector Multiplication on FT64 Stream Processor |
Abstract | ||
---|---|---|
In this paper, we present a detailed case study of the optimizing implementation of a fundamental scientific kernel, matrix-vector multiplication, on FT64, which is the first 64-bit stream processor designed for scientific computing. The major novelties of our study are as follows. First, we develop four stream programs according to different stream organizations, involving dot product, row product, multi-dot product and multi-row product approaches. Second the optimal strip size for partitioning the large matrix is put forward based on a practical parameter model. Finally loop unrolling and software pipelining are used to hide the communications with the computations. The experimental results show that the optimizing implementations on FT64 achieve high speedup over the corresponding Fortran programs running on Itanium 2. It is certain that matrix-vector multiplication can efficiently exploit the tremendous potential of FT64 stream processor through programming optimizations. |
Year | DOI | Venue |
---|---|---|
2008 | 10.1109/ICDS.2008.16 | ICDS |
Keywords | Field | DocType |
matrix multiplication,row product approach,ft64 stream processor,scientific computing,efficient implementation,signal processing,multi-row product approach,detailed case study,scientific kernel implementation,operating system kernels,multi-dot product,vector processor systems,mv-ft optimizing implementation,mathematics computing,matrix partitioning,matrix-vector multiplication,64-bit stream processor,different stream organization,dot product approach,optimal strip size,program control structures,loop unrolling,optimising compilers,optimizing implementation,software pipelining,multirow product approach,stream program,row product,multidot product approach,vectors,pipeline processing,matrix vector multiplication,program optimization | Software pipelining,Computer science,Parallel computing,Itanium,Multiplication,Loop unrolling,Dot product,Stream processing,Matrix multiplication,Speedup | Conference |
ISBN | Citations | PageRank |
978-0-7695-3087-1 | 2 | 0.37 |
References | Authors | |
5 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Jing Du | 1 | 37 | 8.95 |
Fujiang Ao | 2 | 17 | 4.79 |
Xuejun Yang | 3 | 678 | 73.26 |