Exploiting Intra-SM Parallelism in GPUs via Persistent and Elastic Blocks - Citegraph

Paper Info

Title
Exploiting Intra-SM Parallelism in GPUs via Persistent and Elastic Blocks

Abstract
Emerging GPUs have multiple Streaming Multiprocessors (SM), while each SM is comprised of CUDA Cores and Tensor Cores. While CUDA Cores do the general computation, Tensor Cores are designed to speed up matrix multiplication for deep learning applications. However, a GPU kernel often either uses CUDA Cores or Tensor Cores, leaving the other processing units idle. Although many prior research works ...

Year	DOI	Venue
2021	10.1109/ICCD53106.2021.00054	2021 IEEE 39th International Conference on Computer Design (ICCD)
Keywords	DocType	ISSN
Deep learning,Bridges,Schedules,Tensors,Runtime,Conferences,Graphics processing units	Conference	1063-6404
ISBN	Citations	PageRank
978-1-6654-3219-1	1	0.38
References	Authors
0	6

Authors (6 rows)

Cited by (1 rows)

References (0 rows)

Name	Order	Citations	PageRank
Han Zhao	1	8	1.81
Weihao Cui	2	13	3.27
Quan Chen	3	175	21.86
Jieru Zhao	4	2	2.09
Jingwen Leng	5	49	12.97
Minyi Guo	6	3969	332.25

1