Title | ||
---|---|---|
Mixed-Signal Charge-Domain Acceleration of Deep Neural networks through Interleaved Bit-Partitioned Arithmetic. |
Abstract | ||
---|---|---|
Albeit low-power, mixed-signal circuitry suffers from significant overhead of Analog to Digital (A/D) conversion, limited range for information encoding, and susceptibility to noise. This paper aims to address these challenges by offering and leveraging the following mathematical insight regarding vector dot-product---the basic operator in Deep Neural Networks (DNNs). This operator can be reformulated as a wide regrouping of spatially parallel low-bitwidth calculations that are interleaved across the bit partitions of multiple elements of the vectors. As such, the computational building block of our accelerator becomes a wide bit-interleaved analog vector unit comprising a collection of low-bitwidth multiply-accumulate modules that operate in the analog domain and share a single A/D converter(ADC). This bit-partitioning results in a lower-resolution ADC while the wide regrouping alleviates the need for A/D conversion per operation, amortizing its cost across multiple bit-partitions of the vector elements. Moreover, the low-bitwidth modules require smaller encoding range and also provide larger margins for noise mitigation. We also utilize the switched-capacitor design for our bit-level reformulation of DNN operations. The proposed switched-capacitor circuitry performs the regrouped multiplications in the charge domain and accumulates the results of the group in its capacitors over multiple cycles. The capacitive accumulation combined with wide bit-partitioned regrouping reduces the rate of A/D conversions, further improving the overall efficiency of the design.
With such mathematical reformulation and its switched-capacitor implementation, we define one possible 3D-stacked microarchitecture, dubbed BiHiwe, that leverages clustering and hierarchical design to best utilize power-efficiency of the mixed-signal domain and 3D stacking. We also build models for noise, computational non-idealities, and variations. For ten DNN benchmarks, BiHiwe delivers 5.5x speedup over a leading purely-digital 3D-stacked accelerator Tetris, with a mere of less than 0.5% accuracy loss achieved by careful treatment of noise, computation error, and various forms of variation. Compared to RTX~2080~TI with tensor cores and Titan Xp GPUs, all with 8-bit execution, BiHiwe offers 35.4x and 70.1x higher Performance-per-Watt, respectively. Relative to the mixed-signal RedEye, ISAAC, and PipeLayer, BiHiwe offers 5.5x, 3.6x, and 9.6x improvement in Performance-per-Watt respectively. The results suggest that BiHiwe is an effective initial step in a road that combines mathematics, circuits, and architecture.
|
Year | DOI | Venue |
---|---|---|
2020 | 10.1145/3410463.3414634 | PACT '20: International Conference on Parallel Architectures and Compilation Techniques
Virtual Event
GA
USA
October, 2020 |
DocType | Volume | ISBN |
Conference | abs/1906.11915 | 978-1-4503-8075-1 |
Citations | PageRank | References |
2 | 0.35 | 43 |
Authors | ||
8 |
Name | Order | Citations | PageRank |
---|---|---|---|
Soroush Ghodrati | 1 | 13 | 1.94 |
Hardik Sharma | 2 | 86 | 3.00 |
Sean Kinzer | 3 | 7 | 2.14 |
Amir Yazdanbakhsh | 4 | 241 | 15.28 |
Kambiz Samadi | 5 | 817 | 43.11 |
Nam Sung Kim | 6 | 3268 | 225.99 |
Doug Burger | 7 | 6160 | 491.08 |
H. Esmaeilzadeh | 8 | 1443 | 69.71 |