Using Task-Based Parallelism Directly On The Gpu For Automated Asynchronous Data Transfer - Citegraph

Paper Info

Title
Using Task-Based Parallelism Directly On The Gpu For Automated Asynchronous Data Transfer

Abstract
We present a framework, based on the QuickSched[1] library, that implements priority-aware task-based parallelism directly on CUDA GPUs. This allows large computations with complex data dependencies to be executed in a single GPU kernel call, removing any synchronization points that might otherwise be required between kernel calls. Using this paradigm, data transfers to and from the GPU are modelled as load and unload tasks. These tasks are automatically generated and executed alongside the rest of the computational tasks, allowing fully asynchronous and concurrent data transfers. We implemented a tiled-QR decomposition, and a Barnes-Hut gravity calculation, both of which show significant improvement when utilising the task-based setup, effectively eliminating any latencies due to data transfers between the GPU and the CPU. This shows that task-based parallelism is a valid alternative programming paradigm on GPUs, and can provide significant gains from both a data transfer and ease-of-use perspective.

Year	DOI	Venue
2015	10.3233/978-1-61499-621-7-683	PARALLEL COMPUTING: ON THE ROAD TO EXASCALE
Keywords	Field	DocType
Task-based parallelism, general-purpose GPU computing, Asynchronous data transfer	Instruction-level parallelism,Asynchronous communication,Computer architecture,Data transmission,Task parallelism,Computer science,Parallel computing,Theoretical computer science,Data parallelism	Conference
Volume	ISSN	Citations
27	0927-5452	0
PageRank	References	Authors
0.34	0	3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Aidan B. G. Chalk	1	4	2.18
Pedro Gonnet	2	89	13.43
Matthieu Schaller	3	4	2.85

1