Title
Cache-aware iteration space partitioning
Abstract
The need for high performance per watt has led to the development of multi-core systems such as the Intel Core 2 Duo processor and the Intel quad-core Kentsfield processor. Maximal exploitation of the hardware parallelism supported by such systems necessitates the development of concurrent software. This, in part, entails program parallelization and efficient mapping of the parallelized program onto the different cores. The latter affects the load balance between the different cores which in turn has a direct impact on performance. In light of the fact that parallel loops, such as a parallel DO loop in Fortran, account for a large percentage of the total execution time, we focus on the problem of how to efficiently partition the iteration space of (possibly) nested perfect/non-perfect parallel loops. In this regard, one of the key aspects is how to efficiently capture the cache behavior as the cache subsystem is often the main performance bottleneck in multi-core systems. In this paper, we present a novel profile-guided compiler technique for cache-aware partitioning of iteration spaces of parallel loops. We present a case study using a kernel from the industry-standard SPEC CPU benchmark suite.
Year
DOI
Venue
2008
10.1145/1345206.1345250
PPOPP
Keywords
Field
DocType
cache-aware iteration space partitioning,cache behavior,different core,multi-core system,parallel loop,intel core,main performance bottleneck,high performance,intel quad-core kentsfield processor,iteration space,non-perfect parallel loop,load balancing,load balance
Kernel (linear algebra),Bottleneck,Load balancing (computing),Cache,Computer science,Parallel computing,Do while loop,Fortran,Compiler,Performance per watt
Conference
Citations 
PageRank 
References 
3
0.44
13
Authors
5
Name
Order
Citations
PageRank
Arun Kejariwal128126.23
Alexandru Nicolau22265307.74
Utpal Banerjee352275.79
Alexander V. Veidenbaum475778.24
Constantine D. Polychronopoulos5893129.02