Title
Reducing task creation and termination overhead in explicitly parallel programs
Abstract
There has been a proliferation of task-parallel programming systems to address the requirements of multicore programmers. Current production task-parallel systems include Cilk++, Intel Threading Building Blocks, Java Concurrency, .Net Task Parallel Library, OpenMP 3.0, and current research task-parallel languages include Cilk, Chapel, Fortress, X10, and Habanero-Java (HJ). It is desirable for the programmer to express all the parallelism intrinsic to their algorithm in their code for forward scalability and portability, but the overhead incurred by doing so can be prohibitively large in today's systems. In this paper, we address the problem of reducing the total amount of overhead incurred by a program due to excessive task creation and termination. We introduce a transformation framework to optimize task-parallel programs with finish, forall and next statements. Our approach includes elimination of redundant task creation and termination operations as well as strength reduction of termination operations (finish) to lighter-weight synchronizations (next). Experimental results were obtained on three platforms: a dual-socket 128-thread (16-core) Niagara T2 system, a quad-socket 16-way Intel Xeon SMP and a quad-socket 32-way Power7 SMP. The results showed maximum speedup 66.7x, 11.25x and 23.1x respectively on each platform and 4.6x, 2.1x and 6.4x performance improvements respectively in geometric mean related to non-optimized parallel codes. The original benchmarks in this study were written with medium-grained parallelism; a larger relative improvement can be expected for programs written with finer-grained parallelism. However, even for the medium-grained parallel benchmarks studied in this paper, the significant improvement obtained by the transformation framework underscores the importance of the compiler optimizations introduced in this paper.
Year
DOI
Venue
2010
10.1145/1854273.1854298
PACT
Keywords
Field
DocType
termination overhead,termination operation,32-way power7 smp,medium-grained parallelism,task creation,task-parallel program,current production task-parallel system,task-parallel programming system,finer-grained parallelism,transformation framework,intel threading building blocks,current research task-parallel language,geometric mean,parallel systems,compiler optimization,optimization
Instruction-level parallelism,Parallel Extensions,Implicit parallelism,Task parallelism,Computer science,Parallel computing,Real-time computing,Optimizing compiler,Data parallelism,Cilk,Multi-core processor,Operating system
Conference
ISBN
Citations 
PageRank 
978-1-5090-5032-1
17
1.05
References 
Authors
12
4
Name
Order
Citations
PageRank
Jisheng Zhao148024.34
Jun Shirako243334.56
V. Krishna Nandivada314216.26
Vivek Sarkar44318409.41