An optimal parallel prefix-sums algorithm on the memory machine models for GPUs - Citegraph

Paper Info

Title
An optimal parallel prefix-sums algorithm on the memory machine models for GPUs

Abstract
The main contribution of this paper is to show optimal algorithms computing the sum and the prefix-sums on two memory machine models, the Discrete Memory Machine (DMM) and the Unified Memory Machine (UMM). The DMM and the UMM are theoretical parallel computing models that capture the essence of the shared memory and the global memory of GPUs. These models have three parameters, the number p of threads, the width w of the memory, and the memory access latency l. We first show that the sum of n numbers can be computed in $O({n\over w}+{nl\over p}+l\log n)$ time units on the DMM and the UMM. We then go on to show that $\Omega({n\over w}+{nl\over p}+l\log n)$ time units are necessary to compute the sum. Finally, we show an optimal parallel algorithm that computes the prefix-sums of n numbers in $O({n\over w}+{nl\over p}+l\log n)$ time units on the DMM and the UMM.

Year	DOI	Venue
2012	10.1007/978-3-642-33078-0_8	ICA3PP
Keywords	Field	DocType
optimal parallel prefix-sums algorithm,memory access latency l,time unit,width w,n number,number p,memory machine model,shared memory,discrete memory machine,global memory,log n,parallel algorithm	Binary logarithm,Shared memory,CUDA,Computer science,Parallel algorithm,Parallel computing,Algorithm,Thread (computing),Omega,Machine models,Distributed computing,Parallel prefix	Conference
Citations	PageRank	References
15	1.01	9
Authors
1

Authors (1 rows)

Cited by (15 rows)

References (9 rows)

Name	Order	Citations	PageRank
Koji Nakano	1	1165	118.13

1