Title
Fast Barriers for Scalable ccNUMA Systems
Abstract
As multiprocessors systems become larger and larger and network latency rapidly approaches thousands of processor cycles, the primary factor in determining a barrier algorithmýs performance is the number of serial network latencies it requires. Existing barrier algorithms require at least O(logN) round trip message latencies to perform a single barrier operation on an N-node shared memory multiprocessor. In addition, existing barrier algorithms are not well tuned in terms of how they interact with modern shared memory systems, which leads to an excessive number of message exchanges to signal barrier completion. The contributions of this paper are threefold. First, we identify and quantify the performance deficiencies of conventional barrier implementations when they are executed on real (non-idealized) hardware. Second, we propose a queue-based barrier algorithm that has effectively O(1) time complexity as measured in round trip message latencies. Third, we demonstrate how matching the barrier implementation to the way that modern shared memory systems operate can improve performance dramatically by exploiting a hardware write-update (PUT) mechanism for signaling. The resulting barrier algorithm only costs one serialized round trip message latency to perform a barrier operation across N processors. Using a cycle-accurate execution-driven simulator of a future-generation SGI multiprocessor, we show that with no special hardware support our queue-based barrier outperforms OpenMPýs LL/SC-based barrier implementation by a factor of 7.9 on 256 processors. With hardware that supports a coherent PUT operation, our queue-based barrier outperforms OpenMP barriers by a factor of 94 and outperforms barriers based on SGIýs memory controller-based atomic operations by a factor of 6.5 on 256 processors.
Year
DOI
Venue
2005
10.1109/ICPP.2005.39
ICPP
Keywords
Field
DocType
conventional barrier implementation,openmp barrier,queue-based barrier,barrier completion,barrier operation,barrier implementation,barrier algorithm,existing barrier algorithm,sc-based barrier implementation,scalable ccnuma systems,queue-based barrier algorithm,fast barriers,hardware,time measurement,scalability,computer networks,message passing,time complexity,microprogramming,queueing theory,computational complexity
Microcode,Shared memory,Computer science,Parallel computing,Queue,Distributed memory,Queueing theory,Message passing,Memory controller,Distributed computing,Memory barrier
Conference
ISSN
ISBN
Citations 
0190-3918
0-7695-2380-3
5
PageRank 
References 
Authors
0.56
13
2
Name
Order
Citations
PageRank
Liqun Cheng1492.96
John B. Carter21785162.82