Title
A High Performance Symv Kernel On A Fermi-Core Gpu
Abstract
A high-performance SYMV kernel is implemented on Fermi-core GPUs using an atomic-operation based algorithm. The algorithm is effective for the memory bandwidth and reduced memory usage. On a Tesla C2050, sustained double-precision and single-precision performances of approximately 43 GFLOPS and 78 GFLOPS, respectively, were achieved. The proposed SYMV kernel also performs on a GeForce GTX580 with 72 GFLOPS and 128 GFLOPS in the double-precision and single-precision modes, respectively. The proposed SYMV kernel outperforms major CUDA BLAS kernels, CUBLAS, MAGMABLAS, and CULA-BLAS. This performance improvement has a significant impact when the SYMV kernel is plugged into user codes.
Year
DOI
Venue
2012
10.1007/978-3-642-38718-0_9
HIGH PERFORMANCE COMPUTING FOR COMPUTATIONAL SCIENCE - VECPAR 2012
Field
DocType
Volume
Kernel (linear algebra),Fermi Gamma-ray Space Telescope,Memory bandwidth,Computer science,CUDA,FLOPS,Parallel computing,Performance improvement
Conference
7851
ISSN
Citations 
PageRank 
0302-9743
1
0.40
References 
Authors
3
3
Name
Order
Citations
PageRank
Toshiyuki Imamura19522.21
Susumu Yamada2369.54
Masahiko Machida3349.76