Title
Asynchronous Memory Machine Models with Barrier Synchronization
Abstract
The Discrete Memory Machine (DMM) and the Unified Memory Machine (UMM) are theoretical parallel computing models that capture the essence of the shared memory and the global memory of GPUs. It was assumed that warps (i.e. groups of threads) on the DMM and the UMM work synchronously in the round-robin manner. However, warps work asynchronously in the actual GPUs, in the sense that warps may be randomly (or arbitrarily) dispatched for execution. The first contribution of this paper is to introduce an asynchronous version of the DMM and the UMM, in which warps are arbitrarily dispatched. Instead, we assume that threads can execute the ``sync threads'' instruction for barrier synchronization. Since the barrier synchronization operation is costly, we should evaluate and minimize the number of barrier synchronization operations performed by parallel algorithms. The second contribution of this paper is to show a parallel algorithm to compute the sum of \boldmath$n$ numbers in optimal computing time and few barrier synchronization steps. Our parallel algorithm computes the sum of \boldmath$n$ numbers in \boldmath$O({n\over w}+l\log n)$ time units and \boldmath$O(\log{l\over w}+\log\log w)$ barrier synchronization steps using \boldmath$wl$ threads both on the asynchronous DMM and on the asynchronous UMM with width \boldmath$w$ and latency \boldmath$l$. We also prove that the computing time is optimal because it matches the theoretical lower bound. Quite surprisingly, the number of barrier synchronization steps and the number of threads are independent of \boldmath$n$. Even if the input size \boldmath$n$ is quite large, our parallel algorithm computes the sum in optimal time units and a fixed number of sync threads using a fixed number of threads.
Year
DOI
Venue
2014
10.1109/ICNC.2012.18
IEICE Transactions
Keywords
DocType
Volume
barrier synchronization,parallel algorithm,barrier synchronization operation,asynchronous umm,asynchronous memory machine models,sync thread,fixed number,asynchronous dmm,theoretical parallel computing model,umm work synchronously,barrier synchronization step,synchronisation,parallel algorithms,computational complexity
Journal
97-D
Issue
ISSN
ISBN
3
1745-1361
978-1-4673-4624-5
Citations 
PageRank 
References 
1
0.36
8
Authors
2
Name
Order
Citations
PageRank
Koji Nakano11165118.13
koji210.36