Title
Exploiting Intra-SM Parallelism in GPUs via Persistent and Elastic Blocks
Abstract
Emerging GPUs have multiple Streaming Multiprocessors (SM), while each SM is comprised of CUDA Cores and Tensor Cores. While CUDA Cores do the general computation, Tensor Cores are designed to speed up matrix multiplication for deep learning applications. However, a GPU kernel often either uses CUDA Cores or Tensor Cores, leaving the other processing units idle. Although many prior research works ...
Year
DOI
Venue
2021
10.1109/ICCD53106.2021.00054
2021 IEEE 39th International Conference on Computer Design (ICCD)
Keywords
DocType
ISSN
Deep learning,Bridges,Schedules,Tensors,Runtime,Conferences,Graphics processing units
Conference
1063-6404
ISBN
Citations 
PageRank 
978-1-6654-3219-1
1
0.38
References 
Authors
0
6
Name
Order
Citations
PageRank
Han Zhao181.81
Weihao Cui2133.27
Quan Chen317521.86
Jieru Zhao422.09
Jingwen Leng54912.97
Minyi Guo63969332.25