Title
Using Task-Based Parallelism Directly On The Gpu For Automated Asynchronous Data Transfer
Abstract
We present a framework, based on the QuickSched[1] library, that implements priority-aware task-based parallelism directly on CUDA GPUs. This allows large computations with complex data dependencies to be executed in a single GPU kernel call, removing any synchronization points that might otherwise be required between kernel calls. Using this paradigm, data transfers to and from the GPU are modelled as load and unload tasks. These tasks are automatically generated and executed alongside the rest of the computational tasks, allowing fully asynchronous and concurrent data transfers. We implemented a tiled-QR decomposition, and a Barnes-Hut gravity calculation, both of which show significant improvement when utilising the task-based setup, effectively eliminating any latencies due to data transfers between the GPU and the CPU. This shows that task-based parallelism is a valid alternative programming paradigm on GPUs, and can provide significant gains from both a data transfer and ease-of-use perspective.
Year
DOI
Venue
2015
10.3233/978-1-61499-621-7-683
PARALLEL COMPUTING: ON THE ROAD TO EXASCALE
Keywords
Field
DocType
Task-based parallelism, general-purpose GPU computing, Asynchronous data transfer
Instruction-level parallelism,Asynchronous communication,Computer architecture,Data transmission,Task parallelism,Computer science,Parallel computing,Theoretical computer science,Data parallelism
Conference
Volume
ISSN
Citations 
27
0927-5452
0
PageRank 
References 
Authors
0.34
0
3
Name
Order
Citations
PageRank
Aidan B. G. Chalk142.18
Pedro Gonnet28913.43
Matthieu Schaller342.85