Title
Hardware-based generation of independent subtraces of instructions in clustered processors
Abstract
Multicore chips are currently dominating the microprocessor market as designs that improve performance and sustain power consumption. However, complex core features must be still considered to provide good performance for existing sequential applications. An effective approach to reduce core complexity without dramatically sacrificing performance is to distribute critical processor structures by using clustered microarchitectures. In these designs, communication latency among clusters is a critical performance bottleneck, and a good steering algorithm is required to reduce intercluster communication. In this paper, we propose a new energy-efficient microarchitectural approach that reduces intercluster communication by detecting and generating independent chains of instructions, referred to as subtraces, from the execution of sequential programs. The devised mechanism has been modeled on an x86-based trace-cache processor, where subtraces are built in the fill unit, stored in a trace cache, and individually steered to different clusters. Experimental results show that the proposal reaches performance speedups around 7 and 15 percent for point-to-point and bus-based interconnects, respectively, while achieving energy savings of up to 12 percent.
Year
DOI
Venue
2013
10.1109/TC.2012.42
Computers, IEEE Transactions
Keywords
Field
DocType
cache storage,computational complexity,microprocessor chips,multiprocessing systems,parallel architectures,performance evaluation,power consumption,bus-based interconnects,clustered microarchitectures,clustered processors,complex core features,core complexity,critical performance bottleneck,critical processor structures,energy-efficient microarchitectural approach,hardware-based generation,independent instruction subtraces,intercluster communication,microprocessor market,multicore chips,performance speedups,point-to-point interconnects,power consumption,sequential applications,steering algorithm,x86-based trace-cache processor,Algorithm design and analysis,Clustered processors,Clustering algorithms,Multicore processing,Program processors,Radiation detectors,Registers,parallelism,subtraces
x86,Bottleneck,Algorithm design,Computer science,Latency (engineering),Microprocessor,Parallel computing,Real-time computing,Cluster analysis,Multi-core processor,Computational complexity theory,Embedded system
Journal
Volume
Issue
ISSN
62
5
0018-9340
Citations 
PageRank 
References 
0
0.34
14
Authors
4
Name
Order
Citations
PageRank
Rafael Ubal132216.93
Sahuquillo, J.221.71
Petit, S.332.06
Lopez, P.4646.88