Title
Exploring the VLSI Scalability of Stream Processors
Abstract
Stream processors are high-performance programmable processors optimized to run media applications. Recent work has shown these processors to be more area- and energy-efficient than conventional programmable architectures. This paper explores the scalability of stream architectures to future VLSI technologies where over a thousand floating-point units on a single chip will be feasible. Two techniques for increasing the number of ALUs in a streamprocessor are presented: intracluster and intercluster scaling. These scaling techniques are shown to be cost-efficient to tens of ALUs per cluster and to hundreds of arithmetic clusters. A 640-ALU stream processor with 128 clusters and 5 ALUs per cluster is shown to be feasible in 45 nanometer technology, sustaining over 300 GOPS on kernels and providing 15.3x of kernel speedup and 8.0x of application speedup over a 40-ALU stream processor with a 2% degradation in area per ALU and a 7% degradation in energy dissipated per ALU operation.
Year
DOI
Venue
2003
10.1109/HPCA.2003.1183534
HPCA
Keywords
Field
DocType
arithmetic cluster,stream processors,high-performance programmable processor,intercluster scaling,stream architecture,application speedup,conventional programmable architecture,stream processor,vlsi scalability,640-alu stream processor,alu operation,40-alu stream processor,energy dissipation,energy efficient,cost efficiency,vlsi,chip,floating point arithmetic,floating point unit
Kernel (linear algebra),Cluster (physics),Floating point,Computer science,Parallel computing,Chip,Real-time computing,Stream processing,Very-large-scale integration,Scalability,Speedup
Conference
ISBN
Citations 
PageRank 
0-7695-1871-0
24
8.48
References 
Authors
10
6
Name
Order
Citations
PageRank
Brucek Khailany11187118.43
William J. Dally2117821460.14
Scott Rixner31418141.51
Ujval J. Kapasi4781106.46
John D. Owens53263298.85
Brian Towles62564195.45