Weak execution ordering - exploiting iterative methods on many-core GPUs - Citegraph

Paper Info

Title
Weak execution ordering - exploiting iterative methods on many-core GPUs

Abstract
On NVIDIA's many-core GPUs, there is no synchronization function among parallel thread blocks. When fine-granularity of data communication and synchronization is required for large-scale parallel programs executed by multiple thread blocks, frequent host synchronization are necessary, and they incur a significant overhead. In this paper, we investigate a class of applications which uses a chaotic version of iterative methods [5], [22] to obtain numerical solutions for partial differential equations (PDE). Such a fast PDE solver is parallelized on GPUs with multiple thread blocks. In this parallel implementation, although frequent data communication is needed between adjacent thread blocks, a precise order of the data communication is not necessary. Separate communication threads are used for periodically exchanging the boundary values with adjacent thread blocks through the global memory. Since a precise order of the data communication is not required, the computation and the communication threads can be overlapped to alleviate the communication overhead. Performance measurements of two popular applications, Poisson image editing from computer graphics and shape from shading from computer vision, on Tesla C1060 show that a speedup of 4-5 times is achievable for both applications in comparison with the solution using host synchronization.

Year	DOI	Venue
2010	10.1109/ISPASS.2010.5452028	ISPASS
Keywords	Field	DocType
many-core gpu,tesla c1060,data communication,data communication equipment,weak execution ordering,computer graphics,host synchronization,shape from shading,parallel thread blocks,computer vision,partial differential equations,coprocessors,poisson image editing,iterative methods,multicore processing,computer graphic,iteration method,application software,partial differential equation	Synchronization,Computer science,CUDA,Parallel computing,Thread (computing),Thread safety,Coprocessor,Graphics processing unit,Multi-core processor,Speedup	Conference
ISBN	Citations	PageRank
978-1-4244-6024-3	1	0.43
References	Authors
9	6

Authors (6 rows)

Cited by (1 rows)

References (9 rows)

Name	Order	Citations	PageRank
Jianmin Chen	1	895	28.70
Zhuo Huang	2	8	1.31
Feiqi Su	3	29	3.57
Jih-Kwon Peir	4	248	34.53
Jeff Ho	5	1	0.43
Peng Lu	6	24	2.54

1