Title
Exploiting Fine-Grained Data Parallelism with Chip Multiprocessors and Fast Barriers
Abstract
We examine the ability of CMPs, due to their lower onchip communication latencies, to exploit data parallelism at inner-loop granularities similar to that commonly targeted by vector machines. Parallelizing code in this manner leads to a high frequency of barriers, and we explore the impact of different barrier mechanisms upon the efficiency of this approach. To further exploit the potential of CMPs for fine-grained data parallel tasks, we present barrier filters, a mechanism for fast barrier synchronization on chip multi-processors to enable vector computations to be efficiently distributed across the cores of a CMP. We ensure that all threads arriving at a barrier require an unavailable cache line to proceed, and, by placing additional hardware in the shared portions of the memory subsytem, we starve their requests until they all have arrived. Specifically, our approach uses invalidation requests to both make cache lines unavailable and identify when a thread has reached the barrier. We examine two types of barrier filters, one synchronizing through instruction cache lines, and the other through data cache lines.
Year
DOI
Venue
2006
10.1109/MICRO.2006.23
MICRO
Keywords
Field
DocType
chip multiprocessors,cache line,data cache line,exploiting fine-grained data parallelism,fast barrier synchronization,present barrier filter,barrier filter,different barrier mechanism,fast barriers,data parallelism,fine-grained data,unavailable cache line,instruction cache line,vector machine,synchronisation
Synchronization,Computer science,Cache,CPU cache,Parallel computing,Synchronizing,Thread (computing),Chip,Exploit,Real-time computing,Data parallelism
Conference
ISSN
ISBN
Citations 
1072-4451
0-7695-2732-9
32
PageRank 
References 
Authors
1.67
20
6
Name
Order
Citations
PageRank
Jack Sampson139832.45
Ruben Gonzalez2321.67
Jean-François Collard324721.24
Norman P. Jouppi46042791.53
M. Schlansker512311.86
Brad Calder64145251.59