Title
Scalability Aspects of Instruction Distribution Algorithms for Clustered Processors
Abstract
In the evolving submicron technology, making it particularly attractive to use decentralized designs. A common form of decentralization adopted in processors is to partition the execution core into multiple clusters. Each cluster has a small instruction window, and a set of functional units. A number of algorithms have been proposed for distributing instructions among the clusters. The first part of this paper analyzes (qualitatively as well as quantitatively) the effect of various hardware parameters such as the type of cluster interconnect, the fetch size, the cluster issue width, the cluster window size, and the number of clusters on the performance of different instruction distribution algorithms. The study shows that the relative performance of the algorithms is very sensitive to these hardware parameters and that the algorithms that perform relatively better with four or fewer clusters are generally not the best ones for a larger number of clusters. This is important, given that with an imminent increase in the transistor budget, more clusters are expected to be integrated on a single chip. The second part of the paper investigates alternate interconnects that provide scalable performance as the number of clusters is increased. In particular, it investigates two hierarchical interconnects驴a single ring of crossbars and multiple rings of crossbars驴as well as instruction distribution algorithms to take advantage of these interconnects. Our study shows that these new interconnects with the appropriate distribution techniques achieve an IPC (instructions per cycle) that is 15-20 percent better than the most scalable existing configuration, and is within 2 percent of that achieved by a hypothetical ideal processor having a 1-cycle latency crossbar interconnect. These results confirm the utility and applicability of hierarchical interconnects and hierarchical distribution algorithms in clustered processors.
Year
DOI
Venue
2005
10.1109/TPDS.2005.128
IEEE Trans. Parallel Distrib. Syst.
Keywords
Field
DocType
interconnection architectures,scalability aspects,cluster window size,instruction distribution algorithm,clustered processors,pipeline pro- cessors,cluster issue width,instruction distribution algorithms,appropriate distribution technique,alternate interconnects,hierarchical interconnects,new interconnects,index terms— clustered processor architecture,different instruction distribution algorithm,fewer cluster,load balancing and task assignment,hierarchical distribution algorithm,chip,resource allocation,instructions per cycle,distributed algorithm,functional unit,load balance,indexing terms,instruction sets
Cluster (physics),Instruction set,Computer science,Real-time computing,Crossbar switch,Distributed computing,Instructions per cycle,Parallel computing,Algorithm,Chip,Resource allocation,Instruction window,Scalability
Journal
Volume
Issue
ISSN
16
10
1045-9219
Citations 
PageRank 
References 
1
0.35
15
Authors
2
Name
Order
Citations
PageRank
Aneesh Aggarwal120216.91
Manoj Franklin215811.38