Title
Complexity Effective Bypass Networks
Abstract
Superscalar processors depend heavily on broadcast-based bypass networks to improve performance by exploiting more instruction level parallelism. However, increasing clock speeds and shrinking technology make broadcasting slower and difficult to implement, especially for wide issue and deeply pipelined processors. High latency bypass networks delay the execution of dependent instructions, which could result in significant performance loss. In this paper, we first perform a detailed analysis of the performance impact due to delays in the execution of dependent instructions caused by high latency bypass networks. We found that the performance impact due to delayed data-dependent instruction execution varies based on the data dependence present in a program and on the type of instructions constituting the program code. We also found that the performance impact varies significantly with the hardware configuration, and that with a high latency bypass network, the processor hardware critical for near-maximal performance reduces considerably. We then propose Single FU bypass networks to reduce the bypass network latency, where results from an FU are forwarded only to itself. The new bypass network design is based on the observations that an instruction's result is mostly required by just one other instruction and that the operands of many instructions come from a single other instruction. The new bypass network results in significant reduction in the data forwarding latency, while incurring only a small impact (about 2% for most of the SPEC2K benchmarks) on the instructions per cycle (IPC) count. However, reduced bypass latency can potentially increase the clock speed. Single FU bypass networks are also much more scalable than the broadcast-based bypass networks, for more wide and more deeply pipelined future microprocessors.
Year
DOI
Venue
2009
10.1007/978-3-642-00904-4_11
T. HiPEAC
Keywords
Field
DocType
new bypass network design,complexity effective bypass networks,single fu bypass network,broadcast-based bypass network,bypass network latency,performance impact,reduced bypass latency,new bypass network result,dependent instruction,high latency bypass network,clock speed,instructions per cycle,network delay,network design
Instructions per cycle,Instruction-level parallelism,Broadcasting,Network planning and design,Latency (engineering),Computer science,Parallel computing,Operand,Real-time computing,Clock rate,Embedded system,Scalability
Journal
Volume
ISSN
Citations 
2
0302-9743
0
PageRank 
References 
Authors
0.34
23
1
Name
Order
Citations
PageRank
Aneesh Aggarwal120216.91