Abstract | ||
---|---|---|
Recent years have witnessed wide applications of deep convolutional neural network (DCNN) in different scenarios. However, its large computational cost and memory consumption seem to be barriers to computing restrained applications. Model quantization is a common method to reduce the storage and computation burden by decreasing the bit width. In this work, we propose a novel cursor based adaptive quantization method using differentiable architecture search (DAS). The multiple bits' quantization mechanism is formulated as a DAS process with a continuous cursor that represents the quantization bit width. The cursor-based DAS adaptively searches for the desired quantization bit width for each layer. The DAS process can be solved via an alternating approximate optimization process. We further devise a new loss function in the search process to collaboratively optimize accuracy and parameter size of the model. In the quantization step, based on a new strategy, the closest two integers to the cursor are adopted as the bits to quantize the DCNN together to reduce the quantization noise and avoid the local convergence problem. Comprehensive experiments on benchmark datasets show that our cursor based adaptive quantization approach can efficiently obtain lower size model with comparable or even better classification accuracy. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1109/IJCNN52387.2021.9533578 | 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) |
Keywords | DocType | ISSN |
Model Compression, Quantization, Deep Neural Network | Conference | 2161-4393 |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Baopu Li | 1 | 348 | 30.88 |
Yanwen Fan | 2 | 1 | 1.41 |
Zhihong Pan | 3 | 3 | 2.80 |
Zhiyu Cheng | 4 | 0 | 0.34 |
Gang Zhang | 5 | 2 | 3.58 |