Title
Cursor-based Adaptive Quantization for Deep Convolutional Neural Network
Abstract
Recent years have witnessed wide applications of deep convolutional neural network (DCNN) in different scenarios. However, its large computational cost and memory consumption seem to be barriers to computing restrained applications. Model quantization is a common method to reduce the storage and computation burden by decreasing the bit width. In this work, we propose a novel cursor based adaptive quantization method using differentiable architecture search (DAS). The multiple bits' quantization mechanism is formulated as a DAS process with a continuous cursor that represents the quantization bit width. The cursor-based DAS adaptively searches for the desired quantization bit width for each layer. The DAS process can be solved via an alternating approximate optimization process. We further devise a new loss function in the search process to collaboratively optimize accuracy and parameter size of the model. In the quantization step, based on a new strategy, the closest two integers to the cursor are adopted as the bits to quantize the DCNN together to reduce the quantization noise and avoid the local convergence problem. Comprehensive experiments on benchmark datasets show that our cursor based adaptive quantization approach can efficiently obtain lower size model with comparable or even better classification accuracy.
Year
DOI
Venue
2021
10.1109/IJCNN52387.2021.9533578
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)
Keywords
DocType
ISSN
Model Compression, Quantization, Deep Neural Network
Conference
2161-4393
Citations 
PageRank 
References 
0
0.34
0
Authors
5
Name
Order
Citations
PageRank
Baopu Li134830.88
Yanwen Fan211.41
Zhihong Pan332.80
Zhiyu Cheng400.34
Gang Zhang523.58