Locality-Aware Mapping of Nested Parallel Patterns on GPUs - Citegraph

Paper Info

Title
Locality-Aware Mapping of Nested Parallel Patterns on GPUs

Abstract
Recent work has explored using higher level languages to improve programmer productivity on GPUs. These languages often utilize high level computation patterns (e.g., Map and Reduce) that encode parallel semantics to enable automatic compilation to GPU kernels. However, the problem of efficiently mapping patterns to GPU hardware becomes significantly more difficult when the patterns are nested, which is common in non-trivial applications. To address this issue, we present a general analysis framework for automatically and efficiently mapping nested patterns onto GPUs. The analysis maps nested patterns onto a logical multidimensional domain and parameterizes the block size and degree of parallelism in each dimension. We then add GPU-specific hard and soft constraints to prune the space of possible mappings and select the best mapping. We also perform multiple compiler optimizations that are guided by the mapping to avoid dynamic memory allocations and automatically utilize shared memory within GPU kernels. We compare the performance of our automatically selected mappings to hand-optimized implementations on multiple benchmarks and show that the average performance gap on 7 out of 8 benchmarks is 24%. Furthermore, our mapping strategy outperforms simple 1D mappings and existing 2D mappings by up to 28.6x and 9.6x respectively.

Year	DOI	Venue
2014	10.1109/MICRO.2014.23	MICRO
Keywords	Field	DocType
parallel processing,gpu-specific hard constraints,gpu-specific soft constraints,parallelism degree,shared memory,gpu hardware,2d mapping,block size parameterization,parallel semantics encoding,1d mapping,graphics processing units,gpu kernels,high level computation patterns,logical multidimensional domain,nested parallel pattern,higher level languages,locality-aware mapping,general analysis framework,shared memory systems,pattern mapping,compiler optimizations,optimising compilers,error correcting code,resilience,hardware,kernel,optimization,dram,faults,programming,instruction sets	Kernel (linear algebra),Locality,Shared memory,Computer science,Degree of parallelism,CUDA,Instruction set,Parallel computing,Optimizing compiler,Code generation,Theoretical computer science	Conference
ISSN	Citations	PageRank
1072-4451	20	0.88
References	Authors
19	5

Authors (5 rows)

Cited by (20 rows)

References (19 rows)

Name	Order	Citations	PageRank
HyoukJoong Lee	1	414	17.71
Kevin J. Brown	2	448	18.62
Arvind K. Sujeeth	3	502	20.58
Tiark Rompf	4	743	45.86
Kunle Olukotun	5	4532	373.50

1