Title
Mixed Precision Quantization for ReRAM-based DNN Inference Accelerators
Abstract
ReRAM-based accelerators have shown great potential for accelerating DNN inference because ReRAM crossbars can perform analog matrix-vector multiplication operations with low latency and energy consumption. However, these crossbars require the use of ADCs which constitute a significant fraction of the cost of MVM operations. The overhead of ADCs can be mitigated via partial sum quantization. However, prior quantization flows for DNN inference accelerators do not consider partial sum quantization which is not highly relevant to traditional digital architectures. To address this issue, we propose a mixed precision quantization scheme for ReRAM-based DNN inference accelerators where weight quantization, input quantization, and partial sum quantization are jointly applied for each DNN layer. We also propose an automated quantization flow powered by deep reinforcement learning to search for the best quantization configuration in the large design space. Our evaluation shows that the proposed mixed precision quantization scheme and quantization flow reduce inference latency and energy consumption by up to 3.89× and 4.84×, respectively, while only losing 1.18% in DNN inference accuracy.
Year
DOI
Venue
2021
10.1145/3394885.3431554
2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC)
Keywords
DocType
ISSN
Mixed precision quantization,ReRAM,DNN inference accelerators
Conference
2153-6961
ISBN
Citations 
PageRank 
978-1-7281-8057-1
2
0.37
References 
Authors
0
18
Name
Order
Citations
PageRank
Sitao Huang1819.68
Aayush Ankit27811.75
Plinio Silveira320.37
Rodrigo Antunes420.37
Sai Rahul Chalamalasetti513616.33
Izzat El Hajj6796.91
Dong Eun Kim791.53
Glaucimar Aguiar820.37
Pedro Bruel920.37
Sergey Serebryakov1022.06
Cong Xu11115448.25
Can Li1220.71
Paolo Faraboschi1397481.37
John Paul Strachan1428017.84
Deming Chen151432127.66
Kaushik Roy1623920.51
Wen-mei W. Hwu174322511.62
Dejan S. Milojicic1824931.80