Title | ||
---|---|---|
A 280 mV-to-1.1 V 256b Reconfigurable SIMD Vector Permutation Engine With 2-Dimensional Shuffle in 22 nm Tri-Gate CMOS |
Abstract | ||
---|---|---|
An ultra-low voltage reconfigurable 4-way to 32-way SIMD vector permutation engine is fabricated in 22 nm tri-gate bulk CMOS, consisting of a 32-entry × 256b 3-read/1-write ported register file with a 256b byte-wise any-to-any permute crossbar for 2-dimensional shuffle. The register file integrates a vertical shuffle across multiple entries into read/write operations, and includes clock-less static reads with shared P/N dual-ended transmission gate (DETG) writes, improving register file VMIN by 250 mV across PVT variations with a wide dynamic operating range of 280 mV-1.1 V. The permute crossbar implements an interleaved folded byte-wise multiplexer layout forming an any-to-any fully connected tree to perform a horizontal shuffle with permute accumulate circuits, and includes vector flip-flops, stacked min-delay buffers, shared gates, and ultra-low voltage split-output (ULVS) level shifters improving logic VMIN by 150 mV, while enabling peak energy efficiency of 585 GOPS/W measured at 260 mV, 50 °C. The permutation engine achieves: (i) nominal register file performance of 1.8 GHz, 106 mW measured at 0.9 V, 50 °C, (ii) robust register file functionality measured down to 280 mV with peak energy efficiency of 154 GOPS/W, (iii) scalable permute crossbar performance of 2.9 GHz, 69 mW measured at 1.1 V, 50 °C with sub-threshold operation at 240 mV, 10 MHz consuming 19 μW, and (iv) a 64b 4 × 4 matrix transpose algorithm and AoS to SoA conversion with 40%-53% energy savings and 25%-42% improved peak throughput measured at 1.8 GHz, 0.9 V. |
Year | DOI | Venue |
---|---|---|
2013 | 10.1109/JSSC.2012.2222811 | Solid-State Circuits, IEEE Journal of |
Keywords | Field | DocType |
CMOS integrated circuits,flip-flops,low-power electronics,parallel processing,2-dimensional shuffle,DETG,P/N dual-ended transmission gate,ULVS level shifter,clock-less static reads,frequency 1.8 GHz,frequency 10 MHz,frequency 2.9 GHz,interleaved folded byte-wise multiplexer layout,peak energy efficiency,power 106 mW,power 19 muW,power 69 mW,register file,scalable permute crossbar,shared gates,size 22 nm,stacked min-delay buffer,temperature 50 C,trigate bulk CMOS,ultra-low voltage reconfigurable SIMD vector permutation engine,ultra-low voltage split-output level shifter,vector flip-flops,voltage 0.9 V,voltage 240 mV,voltage 260 mV,voltage 280 mV to 1.1 V,${rm V}_{rm MIN}$,Single instruction multiple data (SIMD),crossbar,flip-flop,level shifter,near-threshold voltage (NTV),permutation,register file,ultra-low voltage,vector processing | Computer science,Parallel computing,SIMD,Register file,Multiplexer,CMOS,Electronic engineering,Transmission gate,Electronic circuit,Crossbar switch,Low-power electronics | Journal |
Volume | Issue | ISSN |
48 | 1 | 0018-9200 |
Citations | PageRank | References |
4 | 0.61 | 0 |
Authors | ||
8 |
Name | Order | Citations | PageRank |
---|---|---|---|
S. K. Hsu | 1 | 521 | 52.06 |
Amit Agarwal | 2 | 693 | 72.95 |
Mark Anders | 3 | 315 | 50.81 |
S. Mathew | 4 | 462 | 76.59 |
Himanshu Kaul | 5 | 456 | 51.07 |
Farhana Sheikh | 6 | 158 | 22.03 |
Ram Krishnamurthy | 7 | 650 | 74.63 |
Hsu, S.K. | 8 | 4 | 0.61 |