Bit fusion: bit-level dynamically composable architecture for accelerating deep neural networks - Citegraph

Paper Info

Title
Bit fusion: bit-level dynamically composable architecture for accelerating deep neural networks

Abstract
Hardware acceleration of Deep Neural Networks (DNNs) aims to tame their enormous compute intensity. Fully realizing the potential of acceleration in this domain requires understanding and leveraging algorithmic properties of DNNs. This paper builds upon the algorithmic insight that bitwidth of operations in DNNs can be reduced without compromising their classification accuracy. However, to prevent loss of accuracy, the bitwidth varies significantly across DNNs and it may even be adjusted for each layer individually. Thus, a fixed-bitwidth accelerator would either offer limited benefits to accommodate the worst-case bitwidth requirements, or inevitably lead to a degradation in final accuracy. To alleviate these deficiencies, this work introduces dynamic bit-level fusion/decomposition as a new dimension in the design of DNN accelerators. We explore this dimension by designing Bit Fusion, a bit-flexible accelerator, that constitutes an array of bit-level processing elements that dynamically fuse to match the bitwidth of individual DNN layers. This flexibility in the architecture enables minimizing the computation and the communication at the finest granularity possible with no loss in accuracy. We evaluate the benefits of Bit Fusion using eight real-world feed-forward and recurrent DNNs. The proposed microarchitecture is implemented in Verilog and synthesized in 45 nm technology. Using the synthesis results and cycle accurate simulation, we compare the benefits of Bit Fusion to two state-of-the-art DNN accelerators, Eyeriss [1] and Stripes [2]. In the same area, frequency, and process technology, Bit Fusion offers 3.9X speedup and 5.1X energy savings over Eyeriss. Compared to Stripes, Bit Fusion provides 2.6X speedup and 3.9X energy reduction at 45 nm node when Bit Fusion area and frequency are set to those of Stripes. Scaling to GPU technology node of 16 nm, Bit Fusion almost matches the performance of a 250-Watt Titan Xp, which uses 8-bit vector instructions, while Bit Fusion merely consumes 895 milliwatts of power.

Year	DOI	Venue
2017	10.1109/ISCA.2018.00069	international symposium on computer architecture
Keywords	DocType	Volume
Bit-Level Composability, Dynamic Composability, Deep Neural Networks, Accelerators, DNN, Convolutional Neural Networks, CNN, Long Short-Term Memory, LSTM, Recurrent Neural Networks, RNN, Quantization, Bit Fusion, Bit Brick	Journal	abs/1712.01507
ISSN	ISBN	Citations
1063-6897	978-1-5386-5984-7	42
PageRank	References	Authors
0.93	35	7

Authors (7 rows)

Cited by (42 rows)

References (35 rows)

Name	Order	Citations	PageRank
Hardik Sharma	1	86	3.00
Jongse Park	2	303	12.47
Suda, N.	3	265	15.18
Liangzhen Lai	4	167	13.14
Benson Chau	5	42	0.93
Vikas Chandra	6	691	59.76
H. Esmaeilzadeh	7	1443	69.71

1