Title
An optimal parallel prefix-sums algorithm on the memory machine models for GPUs
Abstract
The main contribution of this paper is to show optimal algorithms computing the sum and the prefix-sums on two memory machine models, the Discrete Memory Machine (DMM) and the Unified Memory Machine (UMM). The DMM and the UMM are theoretical parallel computing models that capture the essence of the shared memory and the global memory of GPUs. These models have three parameters, the number p of threads, the width w of the memory, and the memory access latency l. We first show that the sum of n numbers can be computed in $O({n\over w}+{nl\over p}+l\log n)$ time units on the DMM and the UMM. We then go on to show that $\Omega({n\over w}+{nl\over p}+l\log n)$ time units are necessary to compute the sum. Finally, we show an optimal parallel algorithm that computes the prefix-sums of n numbers in $O({n\over w}+{nl\over p}+l\log n)$ time units on the DMM and the UMM.
Year
DOI
Venue
2012
10.1007/978-3-642-33078-0_8
ICA3PP
Keywords
Field
DocType
optimal parallel prefix-sums algorithm,memory access latency l,time unit,width w,n number,number p,memory machine model,shared memory,discrete memory machine,global memory,log n,parallel algorithm
Binary logarithm,Shared memory,CUDA,Computer science,Parallel algorithm,Parallel computing,Algorithm,Thread (computing),Omega,Machine models,Distributed computing,Parallel prefix
Conference
Citations 
PageRank 
References 
15
1.01
9
Authors
1
Name
Order
Citations
PageRank
Koji Nakano11165118.13