Abstract | ||
---|---|---|
A high-performance SYMV kernel is implemented on Fermi-core GPUs using an atomic-operation based algorithm. The algorithm is effective for the memory bandwidth and reduced memory usage. On a Tesla C2050, sustained double-precision and single-precision performances of approximately 43 GFLOPS and 78 GFLOPS, respectively, were achieved. The proposed SYMV kernel also performs on a GeForce GTX580 with 72 GFLOPS and 128 GFLOPS in the double-precision and single-precision modes, respectively. The proposed SYMV kernel outperforms major CUDA BLAS kernels, CUBLAS, MAGMABLAS, and CULA-BLAS. This performance improvement has a significant impact when the SYMV kernel is plugged into user codes. |
Year | DOI | Venue |
---|---|---|
2012 | 10.1007/978-3-642-38718-0_9 | HIGH PERFORMANCE COMPUTING FOR COMPUTATIONAL SCIENCE - VECPAR 2012 |
Field | DocType | Volume |
Kernel (linear algebra),Fermi Gamma-ray Space Telescope,Memory bandwidth,Computer science,CUDA,FLOPS,Parallel computing,Performance improvement | Conference | 7851 |
ISSN | Citations | PageRank |
0302-9743 | 1 | 0.40 |
References | Authors | |
3 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Toshiyuki Imamura | 1 | 95 | 22.21 |
Susumu Yamada | 2 | 36 | 9.54 |
Masahiko Machida | 3 | 34 | 9.76 |