Title
Affinity-based cluster assignment for unrolled loops
Abstract
To compete performance-wise, modern VLIW processors must have fast clock rates and high instruction-level parallelism (ILP). Partitioning resources (functional units and registers) into clusters allows the processor to be clocked faster, but operand transfers across clusters can easily become a bottleneck. Increasing the number of functional units increases the potential ILP, but only helps if the functional units can be kept busy.To support these features, optimizations such as loop unrolling must be applied to expose ILP, and instructions must be explicitly assigned to clusters to minimize cross-cluster transfers. In an architecture with homogeneous clusters, the number of functional units of a given type is typically a multiple of the number of clusters. Thus, it is common to unroll a loop so that the number of copies of the loop body is a multiple of the number of clusters. The result is that there is a natural mapping of instructions to clusters, which is often the best mapping. While this mapping can be obvious by inspection, we have found that existing cluster assignment algorithms often miss this natural split. The consequence is an excessive number of inter-cluster transfers, which slows down the loop.Because we were unable to find an existing cluster-assignment algorithm that performed well for unrolled loops, we developed our own. Our Affinity-Based Clustering (ABC) algorithm has been implemented in a production compiler for the Texas Instruments TMS320C6000, a two-cluster VLIW architecture. It is tailored for exploiting the patterns that result from either manual or compiler-based unrolling. As demonstrated experimentally, it performs well, even when post-unrolling optimizations partially obscure the natural split.
Year
DOI
Venue
2002
10.1145/514191.514209
I4CS
Keywords
Field
DocType
functional unit,homogeneous clusters,compiler-based unrolling,affinity-based cluster assignment,potential ilp,natural mapping,natural split,partitioned register files,cluster assignment,loop body,loop optimizations,vliw architectures,unrolled loop,excessive number,affinity-based clustering abc algorithms,software pipelining,loop unrolling,loop scheduling,best mapping,register file,loop optimization
Bottleneck,Software pipelining,Computer science,Very long instruction word,Operand,Parallel computing,Real-time computing,Compiler,Loop unrolling,Natural mapping,Loop scheduling
Conference
ISBN
Citations 
PageRank 
1-58113-483-5
3
0.71
References 
Authors
10
3
Name
Order
Citations
PageRank
Gayathri Krishnamurthy130.71
Elana D. Granston215329.48
Eric J. Stotzer351.11