Asynchronous Memory Machine Models with Barrier Synchronization - Citegraph

Paper Info

Title
Asynchronous Memory Machine Models with Barrier Synchronization

Abstract
The Discrete Memory Machine (DMM) and the Unified Memory Machine (UMM) are theoretical parallel computing models that capture the essence of the shared memory and the global memory of GPUs. It was assumed that warps (i.e. groups of threads) on the DMM and the UMM work synchronously in the round-robin manner. However, warps work asynchronously in the actual GPUs, in the sense that warps may be randomly (or arbitrarily) dispatched for execution. The first contribution of this paper is to introduce an asynchronous version of the DMM and the UMM, in which warps are arbitrarily dispatched. Instead, we assume that threads can execute the ``sync threads'' instruction for barrier synchronization. Since the barrier synchronization operation is costly, we should evaluate and minimize the number of barrier synchronization operations performed by parallel algorithms. The second contribution of this paper is to show a parallel algorithm to compute the sum of \boldmath$n$ numbers in optimal computing time and few barrier synchronization steps. Our parallel algorithm computes the sum of \boldmath$n$ numbers in \boldmath$O({n\over w}+l\log n)$ time units and \boldmath$O(\log{l\over w}+\log\log w)$ barrier synchronization steps using \boldmath$wl$ threads both on the asynchronous DMM and on the asynchronous UMM with width \boldmath$w$ and latency \boldmath$l$. We also prove that the computing time is optimal because it matches the theoretical lower bound. Quite surprisingly, the number of barrier synchronization steps and the number of threads are independent of \boldmath$n$. Even if the input size \boldmath$n$ is quite large, our parallel algorithm computes the sum in optimal time units and a fixed number of sync threads using a fixed number of threads.

Year	DOI	Venue
2014	10.1109/ICNC.2012.18	IEICE Transactions
Keywords	DocType	Volume
barrier synchronization,parallel algorithm,barrier synchronization operation,asynchronous umm,asynchronous memory machine models,sync thread,fixed number,asynchronous dmm,theoretical parallel computing model,umm work synchronously,barrier synchronization step,synchronisation,parallel algorithms,computational complexity	Journal	97-D
Issue	ISSN	ISBN
3	1745-1361	978-1-4673-4624-5
Citations	PageRank	References
1	0.36	8
Authors
2

Authors (2 rows)

Cited by (1 rows)

References (8 rows)

Name	Order	Citations	PageRank
Koji Nakano	1	1165	118.13
koji	2	1	0.36

1