Title
Mixed-Signal Charge-Domain Acceleration of Deep Neural networks through Interleaved Bit-Partitioned Arithmetic.
Abstract
Albeit low-power, mixed-signal circuitry suffers from significant overhead of Analog to Digital (A/D) conversion, limited range for information encoding, and susceptibility to noise. This paper aims to address these challenges by offering and leveraging the following mathematical insight regarding vector dot-product---the basic operator in Deep Neural Networks (DNNs). This operator can be reformulated as a wide regrouping of spatially parallel low-bitwidth calculations that are interleaved across the bit partitions of multiple elements of the vectors. As such, the computational building block of our accelerator becomes a wide bit-interleaved analog vector unit comprising a collection of low-bitwidth multiply-accumulate modules that operate in the analog domain and share a single A/D converter(ADC). This bit-partitioning results in a lower-resolution ADC while the wide regrouping alleviates the need for A/D conversion per operation, amortizing its cost across multiple bit-partitions of the vector elements. Moreover, the low-bitwidth modules require smaller encoding range and also provide larger margins for noise mitigation. We also utilize the switched-capacitor design for our bit-level reformulation of DNN operations. The proposed switched-capacitor circuitry performs the regrouped multiplications in the charge domain and accumulates the results of the group in its capacitors over multiple cycles. The capacitive accumulation combined with wide bit-partitioned regrouping reduces the rate of A/D conversions, further improving the overall efficiency of the design. With such mathematical reformulation and its switched-capacitor implementation, we define one possible 3D-stacked microarchitecture, dubbed BiHiwe, that leverages clustering and hierarchical design to best utilize power-efficiency of the mixed-signal domain and 3D stacking. We also build models for noise, computational non-idealities, and variations. For ten DNN benchmarks, BiHiwe delivers 5.5x speedup over a leading purely-digital 3D-stacked accelerator Tetris, with a mere of less than 0.5% accuracy loss achieved by careful treatment of noise, computation error, and various forms of variation. Compared to RTX~2080~TI with tensor cores and Titan Xp GPUs, all with 8-bit execution, BiHiwe offers 35.4x and 70.1x higher Performance-per-Watt, respectively. Relative to the mixed-signal RedEye, ISAAC, and PipeLayer, BiHiwe offers 5.5x, 3.6x, and 9.6x improvement in Performance-per-Watt respectively. The results suggest that BiHiwe is an effective initial step in a road that combines mathematics, circuits, and architecture.
Year
DOI
Venue
2020
10.1145/3410463.3414634
PACT '20: International Conference on Parallel Architectures and Compilation Techniques Virtual Event GA USA October, 2020
DocType
Volume
ISBN
Conference
abs/1906.11915
978-1-4503-8075-1
Citations 
PageRank 
References 
2
0.35
43
Authors
8
Name
Order
Citations
PageRank
Soroush Ghodrati1131.94
Hardik Sharma2863.00
Sean Kinzer372.14
Amir Yazdanbakhsh424115.28
Kambiz Samadi581743.11
Nam Sung Kim63268225.99
Doug Burger76160491.08
H. Esmaeilzadeh8144369.71