Title
Toward Harnessing DOACROSS Parallelism for Multi-GPGPUs
Abstract
To exploit the full potential of GPGPUs for general purpose computing, DOACR parallelism abundant in scientific and engineering applications must be harnessed. However, the presence of cross-iteration data dependences in DOACR loops poses an obstacle to execute their computations concurrently using a massive number of fine-grained threads. This work focuses on iterative PDE solvers rich in DOACR parallelism to identify optimization principles and strategies that allow their efficient mapping to GPGPUs. Our main finding is that certain DOACR loops can be accelerated further on GPGPUs if they are algorithmically restructured (by a domain expert) to be more amendable to GPGPU parallelization, judiciously optimized (by the compiler), and carefully tuned by a performance-tuning tool. We substantiate this finding with a case study by presenting a new parallel SSOR method that admits more efficient data-parallel SIMD execution than red-black SOR on GPGPUs. Our solution is obtained non-conventionally, by starting from a K-layer SSOR method and then parallelizing it by applying a non-dependence-preserving scheme consisting of a new domain decomposition technique followed by a generalized loop tiling. Despite its relatively slower convergence, our new method outperforms red-black SOR by making a better balance between data reuse and parallelism and by trading off convergence rate for SIMD parallelism. Our experimental results highlight the importance of synergy between domain experts, compiler optimizations and performance tuning in maximizing the performance of applications, particularly PDE-based DOACR loops, on GPGPUs.
Year
DOI
Venue
2010
10.1109/ICPP.2010.13
ICPP
Keywords
Field
DocType
optimisation,doacr parallelism,k-layer ssor method,parallel processing,graphics processing unit,harnessing doacross parallelism,loop tiling,parallel ssor method,sor,red-black sor,data parallel simd execution,optimization principles,domain decomposition technique,general purpose computing,simd parallelism,doacross parallelism,multiprocessing systems,nondependence preserving scheme,certain doacr loop,new parallel ssor method,generalized loop tiling,iterative pde solvers,partial differential equations,gpgpu,multigpgpu,pde based doacr loops,new domain decomposition technique,domain expert,doacr loop,iterative methods,new method,cross iteration data,instruction sets,domain decomposition,optimization,compiler optimization,convergence,convergence rate,kernel
Task parallelism,Computer science,Parallel computing,SIMD,Optimizing compiler,Compiler,Loop tiling,Data parallelism,General-purpose computing on graphics processing units,Performance tuning,Distributed computing
Conference
ISSN
ISBN
Citations 
0190-3918 E-ISBN : 978-0-7695-4156-3
978-0-7695-4156-3
8
PageRank 
References 
Authors
0.56
17
5
Name
Order
Citations
PageRank
Peng Di1594.71
Qing Wan2295.03
Xuemeng Zhang3273.45
Hui Wu46613.24
Jingling Xue51627124.20