Title
POSTER: Accelerate GPU Concurrent Kernel Execution by Mitigating Memory Pipeline Stalls
Abstract
In this study, we demonstrate that the performance may be undermined in the state-of-the-art intra-SM sharing schemes for concurrent kernel execution (CKE) on GPUs, due to the interference among concurrent kernels. We highlight that cache partitioning techniques proposed for CPUs are not effective for GPUs. Then we propose to balance memory accesses and limit the number of inflight memory instructions issued from concurrent kernels to reduce memory pipeline stalls. Our proposed schemes significantly improve the performance of two state-of-the-art intra-SM sharing schemes, Warped-Slicer and SMK.
Year
DOI
Venue
2017
10.1109/PACT.2017.30
2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)
Keywords
Field
DocType
GPUs,concurrent kernel execution,memory pipeline,memory subsystem
Kernel (linear algebra),Resource management,Pipeline transport,Computer science,Cache,Parallel computing,Real-time computing,Interference (wave propagation),Throughput,Benchmark (computing)
Conference
ISSN
ISBN
Citations 
1089-795X
978-1-5090-6765-7
2
PageRank 
References 
Authors
0.36
4
7
Name
Order
Citations
PageRank
Hongwen Dai1283.14
Zhen Lin2354.21
Chao Li31326.04
chen zhao41510.09
Fei Wang520340.33
Nanning Zheng63975329.18
Huiyang Zhou799463.26