Abstract | ||
---|---|---|
Emerging GPUs have multiple Streaming Multiprocessors (SM), while each SM is comprised of CUDA Cores and Tensor Cores. While CUDA Cores do the general computation, Tensor Cores are designed to speed up matrix multiplication for deep learning applications. However, a GPU kernel often either uses CUDA Cores or Tensor Cores, leaving the other processing units idle. Although many prior research works ... |
Year | DOI | Venue |
---|---|---|
2021 | 10.1109/ICCD53106.2021.00054 | 2021 IEEE 39th International Conference on Computer Design (ICCD) |
Keywords | DocType | ISSN |
Deep learning,Bridges,Schedules,Tensors,Runtime,Conferences,Graphics processing units | Conference | 1063-6404 |
ISBN | Citations | PageRank |
978-1-6654-3219-1 | 1 | 0.38 |
References | Authors | |
0 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Han Zhao | 1 | 8 | 1.81 |
Weihao Cui | 2 | 13 | 3.27 |
Quan Chen | 3 | 175 | 21.86 |
Jieru Zhao | 4 | 2 | 2.09 |
Jingwen Leng | 5 | 49 | 12.97 |
Minyi Guo | 6 | 3969 | 332.25 |