Title
Quantifying Data Locality in Dynamic Parallelism in GPUs
Abstract
Dynamic parallelism (DP) is a new feature of emerging GPUs that allows new kernels to be generated and scheduled from the device-side (GPU) without the host-side (CPU) intervention. To efficiently support DP, one of the major challenges is to saturate the GPU processing elements and provide them with the required data in a timely fashion. In this paper, we first conduct a limit study on the performance improvements that can be achieved by hardware schedulers that are provided with accurate data reuse information. We next propose LASER, a Locality-Aware SchedulER, where the hardware schedulers employ data reuse monitors to help make scheduling decisions to improve data locality at runtime. Experimental results on 16 benchmarks show that LASER, on an average, can improve performance by 11.3%.
Year
DOI
Venue
2019
10.1145/3376930.3376947
ACM SIGMETRICS Performance Evaluation Review
Keywords
DocType
Volume
data reuse, gpgpu, performance evaluation
Conference
47
Issue
ISSN
ISBN
1
0163-5999
978-1-4503-6678-6
Citations 
PageRank 
References 
0
0.34
0
Authors
6
Name
Order
Citations
PageRank
Xulong Tang11287.49
Ashutosh Pattnaik21134.70
Onur Kayıran335613.47
Adwait Jog456823.32
Mahmut Taylan Kandemir53811.03
Chita R. Das6103859.34