Title
Exploring dynamic parallelism in OpenMP
Abstract
GPU devices are becoming a common element in current HPC platforms due to their high performance-per-Watt ratio. However, developing applications able to exploit their dazzling performance is not a trivial task, which becomes even harder when they have irregular data access patterns or control flows. Dynamic Parallelism (DP) has been introduced in the most recent GPU architecture as a mechanism to improve applicability of GPU computing in these situations, resource utilization and execution performance. DP allows to launch a kernel within a kernel without intervention of the CPU. Current experiences reveal that DP is offered to programmers at the expenses of an excessive overhead which, together with its architecture dependency, makes it difficult to see the benefits in real applications. In this paper, we propose how to extend the current OpenMP accelerator model to make the use of DP easy and effective. The proposal is based on nesting of teams constructs and conditional clauses, showing how it is possible for the compiler to generate code that is then efficiently executed under dynamic runtime scheduling. The proposal has been implemented on the MACC compiler supporting the OmpSs task--based programming model and evaluated using three kernels with data access and computation patterns commonly found in real applications: sparse matrix vector multiplication, breadth-first search and divide--and--conquer Mandelbrot. Performance results show speed-ups in the 40x range relative to versions not using DP.
Year
DOI
Venue
2015
10.1145/2832105.2832113
WACCPD@SC
Field
DocType
Citations 
Central processing unit,Programming paradigm,Scheduling (computing),Computer science,CUDA,Sparse matrix-vector multiplication,Parallel computing,Compiler,General-purpose computing on graphics processing units,Data access,Distributed computing
Conference
2
PageRank 
References 
Authors
0.56
5
3
Name
Order
Citations
PageRank
Guray Ozen141.27
Eduard Ayguadé22406216.00
Jesús Labarta31862165.09