Title
Fully integer-based quantization for mobile convolutional neural network inference
Abstract
Deploying deep convolutional neural networks on mobile devices is challenging because of the conflict between their heavy computational overhead and the hardware’s restricted computing capacity. Network quantization is typically used to alleviate this problem. However, we found that a “datatype mismatch” issue in existing low bitwidth quantization approaches can generate severe instruction redundancy, dramatically reducing their running efficiency on mobile devices. We therefore propose a novel quantization approach which ensures that only integer-based arithmetic is needed during the inference stage of the quantized model. To this end, we improved the quantization function to compel the quantized value to follow a standard integer format. Then we presented to simultaneously quantize the batch normalization parameters by a logarithm-like method. By doing so, the quantized model can keep the advantage of low bitwidth representation, while preventing the occurrence of “datatype mismatch” issue and corresponding instruction redundancy. Comprehensive experiments show that our method can achieve comparable prediction accuracy to other state-of-the-art methods while reducing the run-time latency by a large margin. Our fully integer-based quantized Resnet-18 has 4-bit weights, 4-bit activations and only a 0.7% top-1 and 0.4% top-5 accuracy drop on the ImageNet dataset. The assembly language implementation of a series of building blockscan reach a maximum of 4.33× the speed of the original full-precision version on an ARMv8 CPU.
Year
DOI
Venue
2021
10.1016/j.neucom.2020.12.035
Neurocomputing
Keywords
DocType
Volume
Convolutional neural network,Quantization,Model compression,Deep learning
Journal
432
ISSN
Citations 
PageRank 
0925-2312
1
0.35
References 
Authors
0
4
Name
Order
Citations
PageRank
Peng Peng1247.11
Mingyu You216016.22
Weisheng Xu3658.28
Jiaxin Li43815.85