Title
Parallelization libraries: Characterizing and reducing overheads
Abstract
Creating efficient, scalable dynamic parallel runtime systems for chip multiprocessors (CMPs) requires understanding the overheads that manifest at high core counts and small task sizes. In this article, we assess these overheads on Intel's Threading Building Blocks (TBB) and OpenMP. First, we use real hardware and simulations to detail various scheduler and synchronization overheads. We find that these can amount to 47% of TBB benchmark runtime and 80% of OpenMP benchmark runtime. Second, we propose load balancing techniques such as occupancy-based and criticality-guided task stealing, to boost performance. Overall, our study provides valuable insights for creating robust, scalable runtime libraries.
Year
DOI
Venue
2011
10.1145/1952998.1953003
TACO
Keywords
Field
DocType
task stealing,openmp benchmark runtime,threading building blocks,small task size,chip multiprocessors,high core count,openmp,criticality-guided task,tbb benchmark runtime,detail various scheduler,intel threading building blocks,scalable dynamic parallel runtime,parallel libraries,parallelization library,performance,scalable runtime library,load balance
Synchronization,Computer science,Load balancing (computing),Threading (manufacturing),Parallel computing,Chip,Overhead (business),Scalability
Journal
Volume
Issue
ISSN
8
1
1544-3566
Citations 
PageRank 
References 
11
0.77
23
Authors
3
Name
Order
Citations
PageRank
Abhishek Bhattacharjee153821.22
Gilberto Contreras241036.87
Margaret Martonosi38647715.76