Title
Partitioning Multi-Threaded Processors with a Large Number of Threads
Abstract
Today's general-purpose processors are increasingly us- ing multithreading in order to better leverage the additional on-chip real estate available with each technology genera- tion. Simultaneous Multi-Threading (SMT) was originally proposed as a large dynamic superscalar processor with monolithic hardware structures shared among all threads. Intel's Hyper-Threaded Pentium 4 processor partitions the queue structures among two threads, demonstrating more balanced performance by reducing the hoarding of struc- tures by a single thread. IBM's Power5 processor is a 2-way Chip Multiprocessor (CMP) of SMT processors, each sup- porting 2 threads, which significantly reduces design com- plexity and can improve power efficiency. This paper examines processor partitioning options for larger numbers of threads on a chip. While growing tran- sistor budgets permit four and eight-thread processors to be designed, design complexity, power dissipation, and wire scaling limitations create significant barriers to their ac- tual realization. We explore the design choices of sharing, or of partitioning and distributing, the front end (instruction cache, instruction fetch, and dispatch), the execution units and associated state, as well as the L1 Dcache banks, in a Clustered Multi-Threaded (CMT) processor. We show that the best performance is obtained by restricting the sharing of the L1 Dcache banks and the execution engines among threads. On the other hand, significant sharing of the front- end resources is the best approach. When compared against large monolithic SMT proces- sors, a CMT processor provides very competitive IPC per- formance on average, 90-96% of that of partitioned SMT while being more scalable and much more power efficient. In a CMP organization, the gap between SMT and CMT processors shrinks further, making a CMP of CMT proces- sors a highly viable alternative for the future.
Year
DOI
Venue
2005
10.1109/ISPASS.2005.1430566
ISPASS
Keywords
Field
DocType
power dissipation,system on chip,multi threading,instruction sets,resource allocation
Multithreading,System on a chip,POWER5,Instruction set,Computer science,Cache,Parallel computing,Multiprocessing,Thread (computing),Pentium
Conference
ISBN
Citations 
PageRank 
0-7803-8965-4
14
1.04
References 
Authors
24
4
Name
Order
Citations
PageRank
A. El-Moursy11218.37
R. Garg2141.04
Albonesi, David H.32091165.88
Sandhya Dwarkadas43504257.31