High Performance, Energy Efficiency, and Scalability With GALS Chip Multiprocessors - Citegraph

Paper Info

Title
High Performance, Energy Efficiency, and Scalability With GALS Chip Multiprocessors

Abstract
Chip multiprocessors with globally asynchronous locally synchronous (GALS) clocking styles are promising candidates for processing computationally-intensive and energy-constrained workloads. The GALS methodology simplifies clock tree design, provides opportunities to use clock and voltage scaling jointly in system submodules to achieve high energy efficiencies, and can also result in easily scalable clocking systems. However, its use typically also introduces performance penalties due to additional communication latency between clock domains. We show that GALS chip multiprocessors (CMPs) with large inter-processor first-inputs-first-outputs (FIFOs) buffers can inherently hide much of the GALS performance penalty while executing applications that have been mapped with few communication loops. In fact, the penalty can be driven to zero with sufficiently large FIFOs and the removal of multiple-loop communication links. We present an example mesh-connected GALS chip multiprocessor and show it has a less than 1% performance (throughput) reduction on average compared to the corresponding synchronous system for many DSP workloads. Furthermore, adaptive clock and voltage scaling for each processor provides an approximately 40% power savings without any performance reduction. These results compare favorably with the GALS uniprocessor, which compared to the corresponding synchronous uniprocessor, has a reported greater than 10% performance (throughput) reduction and an energy savings of approximately 25% using dynamic clock and voltage scaling for many general purpose applications.

Year	DOI	Venue
2009	10.1109/TVLSI.2008.2001947	IEEE Transactions on Very Large Scale Integration Systems
Keywords	Field	DocType
adaptive clock,high performance,inter-processor first-inputs-first-outputs,gals uniprocessor comparison,chip multiprocessor,dsp workload,energy efficient,microprocessor chips,gals uniprocessor,clock tree design,scalable.,scalable,gals chip multiprocessor,gals chip multiprocessors,voltage scaling,clocks,index terms—array processor,performance penalty,globally asynchronous locally synchronous (gals),clock domain,integrated circuit design,low power,globally asynchronous locally synchronous clocking styles,globally asynchronous locally synchronous gals,array processor,system submodules,asynchronous circuits,fifo buffer,computationally-intensive workloads,gals methodology simplifies clock,dynamic clock,scalable clocking systems,multipleloop communication links,gals performance penalty,energy efficiency,energy-constrained workloads,mesh-connected gals chip multiprocessor,gals methodology,fabrication,frequency,indexing terms,circuits,throughput,scalability	Computer science,Globally asynchronous locally synchronous,Real-time computing,Electronic engineering,Throughput,Dynamic voltage scaling,Uniprocessor system,Parallel computing,Multiprocessing,Chip,Integrated circuit design,Embedded system,Scalability	Journal
Volume	Issue	ISSN
17	1	1063-8210
Citations	PageRank	References
10	0.63	17
Authors
2

Authors (2 rows)

Cited by (10 rows)

References (17 rows)

Name	Order	Citations	PageRank
Zhiyi Yu	1	158	18.40
Bevan M. Baas	2	295	27.78

1