A High Performance Symv Kernel On A Fermi-Core Gpu - Citegraph

Paper Info

Title
A High Performance Symv Kernel On A Fermi-Core Gpu

Abstract
A high-performance SYMV kernel is implemented on Fermi-core GPUs using an atomic-operation based algorithm. The algorithm is effective for the memory bandwidth and reduced memory usage. On a Tesla C2050, sustained double-precision and single-precision performances of approximately 43 GFLOPS and 78 GFLOPS, respectively, were achieved. The proposed SYMV kernel also performs on a GeForce GTX580 with 72 GFLOPS and 128 GFLOPS in the double-precision and single-precision modes, respectively. The proposed SYMV kernel outperforms major CUDA BLAS kernels, CUBLAS, MAGMABLAS, and CULA-BLAS. This performance improvement has a significant impact when the SYMV kernel is plugged into user codes.

Year	DOI	Venue
2012	10.1007/978-3-642-38718-0_9	HIGH PERFORMANCE COMPUTING FOR COMPUTATIONAL SCIENCE - VECPAR 2012
Field	DocType	Volume
Kernel (linear algebra),Fermi Gamma-ray Space Telescope,Memory bandwidth,Computer science,CUDA,FLOPS,Parallel computing,Performance improvement	Conference	7851
ISSN	Citations	PageRank
0302-9743	1	0.40
References	Authors
3	3

Authors (3 rows)

Cited by (1 rows)

References (3 rows)

Name	Order	Citations	PageRank
Toshiyuki Imamura	1	95	22.21
Susumu Yamada	2	36	9.54
Masahiko Machida	3	34	9.76

1