Title
A 280 mV-to-1.1 V 256b Reconfigurable SIMD Vector Permutation Engine With 2-Dimensional Shuffle in 22 nm Tri-Gate CMOS
Abstract
An ultra-low voltage reconfigurable 4-way to 32-way SIMD vector permutation engine is fabricated in 22 nm tri-gate bulk CMOS, consisting of a 32-entry × 256b 3-read/1-write ported register file with a 256b byte-wise any-to-any permute crossbar for 2-dimensional shuffle. The register file integrates a vertical shuffle across multiple entries into read/write operations, and includes clock-less static reads with shared P/N dual-ended transmission gate (DETG) writes, improving register file VMIN by 250 mV across PVT variations with a wide dynamic operating range of 280 mV-1.1 V. The permute crossbar implements an interleaved folded byte-wise multiplexer layout forming an any-to-any fully connected tree to perform a horizontal shuffle with permute accumulate circuits, and includes vector flip-flops, stacked min-delay buffers, shared gates, and ultra-low voltage split-output (ULVS) level shifters improving logic VMIN by 150 mV, while enabling peak energy efficiency of 585 GOPS/W measured at 260 mV, 50 °C. The permutation engine achieves: (i) nominal register file performance of 1.8 GHz, 106 mW measured at 0.9 V, 50 °C, (ii) robust register file functionality measured down to 280 mV with peak energy efficiency of 154 GOPS/W, (iii) scalable permute crossbar performance of 2.9 GHz, 69 mW measured at 1.1 V, 50 °C with sub-threshold operation at 240 mV, 10 MHz consuming 19 μW, and (iv) a 64b 4 × 4 matrix transpose algorithm and AoS to SoA conversion with 40%-53% energy savings and 25%-42% improved peak throughput measured at 1.8 GHz, 0.9 V.
Year
DOI
Venue
2013
10.1109/JSSC.2012.2222811
Solid-State Circuits, IEEE Journal of
Keywords
Field
DocType
CMOS integrated circuits,flip-flops,low-power electronics,parallel processing,2-dimensional shuffle,DETG,P/N dual-ended transmission gate,ULVS level shifter,clock-less static reads,frequency 1.8 GHz,frequency 10 MHz,frequency 2.9 GHz,interleaved folded byte-wise multiplexer layout,peak energy efficiency,power 106 mW,power 19 muW,power 69 mW,register file,scalable permute crossbar,shared gates,size 22 nm,stacked min-delay buffer,temperature 50 C,trigate bulk CMOS,ultra-low voltage reconfigurable SIMD vector permutation engine,ultra-low voltage split-output level shifter,vector flip-flops,voltage 0.9 V,voltage 240 mV,voltage 260 mV,voltage 280 mV to 1.1 V,${rm V}_{rm MIN}$,Single instruction multiple data (SIMD),crossbar,flip-flop,level shifter,near-threshold voltage (NTV),permutation,register file,ultra-low voltage,vector processing
Computer science,Parallel computing,SIMD,Register file,Multiplexer,CMOS,Electronic engineering,Transmission gate,Electronic circuit,Crossbar switch,Low-power electronics
Journal
Volume
Issue
ISSN
48
1
0018-9200
Citations 
PageRank 
References 
4
0.61
0
Authors
8
Name
Order
Citations
PageRank
S. K. Hsu152152.06
Amit Agarwal269372.95
Mark Anders331550.81
S. Mathew446276.59
Himanshu Kaul545651.07
Farhana Sheikh615822.03
Ram Krishnamurthy765074.63
Hsu, S.K.840.61