Title
Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS
Abstract
The proliferation of machine learning applications has promoted both CUDA Cores and Tensor Cores’ integration to meet their acceleration demands. While studies have shown that co-locating multiple tasks on the same GPU can effectively improve system throughput and resource utilization, existing schemes focus on scheduling the resources of traditional CUDA Cores and thus lack the ability to exploit the parallelism between Tensor Cores and CUDA Cores.In this paper, we propose Tacker, a static kernel fusion and scheduling approach to improve GPU utilization of both types of cores while ensuring the QoS (Quality-of-Service) of co-located tasks. Tacker consists of a Tensor-CUDA Core kernel fuser, a duration predictor for fused kernels, and a runtime QoS-aware kernel manager. The kernel fuser enables the flexible fusion of kernels that use Tensor Cores and CUDA Cores, respectively. The duration predictor precisely predicts the duration of the fused kernels. Finally, the kernel manager invokes the fused kernel or the original kernel based on the QoS headroom of latency-critical tasks to improve the system throughput. Our experimental results show that Tacker improves the throughput of best-effort applications compared with state-of-the-art solutions by 18.6% on average, while ensuring the QoS of latency-critical tasks.
Year
DOI
Venue
2022
10.1109/HPCA53966.2022.00064
2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)
Keywords
DocType
ISSN
Tensor Core,GPU Utilization,QoS
Conference
1530-0897
ISBN
Citations 
PageRank 
978-1-6654-2028-0
0
0.34
References 
Authors
0
8
Name
Order
Citations
PageRank
Han Zhao100.34
Weihao Cui2133.27
Quan Chen317521.86
Youtao Zhang41977122.84
Yanchao Lu500.34
Chao Li634437.85
Jingwen Leng74912.97
Minyi Guo83969332.25