Abstract | ||
---|---|---|
The main contribution of this paper is to show optimal algorithms computing the sum and the prefix-sums on two memory machine models, the Discrete Memory Machine (DMM) and the Unified Memory Machine (UMM). The DMM and the UMM are theoretical parallel computing models that capture the essence of the shared memory and the global memory of GPUs. These models have three parameters, the number p of threads, the width w of the memory, and the memory access latency l. We first show that the sum of n numbers can be computed in $O({n\over w}+{nl\over p}+l\log n)$ time units on the DMM and the UMM. We then go on to show that $\Omega({n\over w}+{nl\over p}+l\log n)$ time units are necessary to compute the sum. Finally, we show an optimal parallel algorithm that computes the prefix-sums of n numbers in $O({n\over w}+{nl\over p}+l\log n)$ time units on the DMM and the UMM. |
Year | DOI | Venue |
---|---|---|
2012 | 10.1007/978-3-642-33078-0_8 | ICA3PP |
Keywords | Field | DocType |
optimal parallel prefix-sums algorithm,memory access latency l,time unit,width w,n number,number p,memory machine model,shared memory,discrete memory machine,global memory,log n,parallel algorithm | Binary logarithm,Shared memory,CUDA,Computer science,Parallel algorithm,Parallel computing,Algorithm,Thread (computing),Omega,Machine models,Distributed computing,Parallel prefix | Conference |
Citations | PageRank | References |
15 | 1.01 | 9 |
Authors | ||
1 |
Name | Order | Citations | PageRank |
---|---|---|---|
Koji Nakano | 1 | 1165 | 118.13 |