Improving Neural Network Efficiency via Post-training Quantization with Adaptive Floating-Point. - Citegraph

Paper Info

Title
Improving Neural Network Efficiency via Post-training Quantization with Adaptive Floating-Point.

Abstract
Model quantization has emerged as a mandatory technique for efficient inference with advanced Deep Neural Networks (DNN). It converts the model parameters in full precision (32-bit floating point) to the hardware friendly data representation with shorter bit-width, to not only reduce the model size but also simplify the computation complexity. Nevertheless, prior model quantization either suffers from the inefficient data encoding method thus leading to noncompetitive model compression rate, or requires time-consuming quantization aware training process. In this work, we propose a novel Adaptive Floating-Point (AFP) as a variant of standard IEEE-754 floating-point format, with flexible configuration of exponent and mantissa segments. Leveraging the AFP for model quantization (i.e., encoding the parameter) could significantly enhance the model compression rate without accuracy degradation and model re-training. We also want to highlight that our proposed AFP could effectively eliminate the computationally intensive de-quantization step existing in the dynamic quantization technique adopted by the famous machine learning frameworks (e.g., pytorch, tensorRT and etc). Moreover, we develop a framework to automatically optimize and choose the adequate AFP configuration for each layer, thus maximizing the compression efficacy. Our experiments indicate that AFP-encoded ResNet-50/MobileNet-v2 only has ~0.04/0.6% accuracy degradation w.r.t its full-precision counterpart. It outperforms the state-of-the-art works by 1.1% in accuracy using the same bit-width while reducing the energy consumption by 11.2x, which is quite impressive for inference.

Year	DOI	Venue
2021	10.1109/ICCV48922.2021.00523	ICCV
DocType	Citations	PageRank
Conference	0	0.34
References	Authors
0	8

Authors (8 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Fangxin Liu	1	0	1.69
wenbo zhao	2	25	6.07
Zhezhi He	3	136	25.37
Yanzhi Wang	4	1082	136.11
Zongwu Wang	5	0	4.06
Changzhi Dai	6	0	0.34
Xiaoyao Liang	7	585	45.81
Li Jiang	8	286	31.86

1