HAWQ: Hessian AWare Quantization of Neural Networks With Mixed-Precision - Citegraph

Paper Info

Title
HAWQ: Hessian AWare Quantization of Neural Networks With Mixed-Precision

Abstract
Model size and inference speed/power have become a major challenge in the deployment of neural networks for many applications. A promising approach to address these problems is quantization. However, uniformly quantizing a model to ultra-low precision leads to significant accuracy degradation. A novel solution for this is to use mixed-precision quantization, as some parts of the network may allow lower precision as compared to other layers. However, there is no systematic way to determine the precision of different layers. A brute force approach is not feasible for deep networks, as the search space for mixed-precision is exponential in the number of layers. Another challenge is a similar factorial complexity for determining block-wise fine-tuning order when quantizing the model to a target precision. Here, we introduce Hessian AWare Quantization (HAWQ), a novel second-order quantization method to address these problems. HAWQ allows for the automatic selection of the relative quantization precision of each layer, based on the layer's Hessian spectrum. Moreover, HAWQ provides a deterministic fine-tuning order for quantizing layers. We show the results of our method on Cifar-10 using ResNet20, and on ImageNet using Inception-V3, ResNet50 and SqueezeNext models. Comparing HAWQ with state-of-the-art shows that we can achieve similar/better accuracy with 8× activation compression ratio on ResNet20, as compared to DNAS, and up to 1% higher accuracy with up to 14% smaller models on ResNet50 and Inception-V3, compared to recently proposed methods of RVQuant and HAQ. Furthermore, we show that we can quantize SqueezeNext to just 1MB model size while achieving above 68% top1 accuracy on ImageNet.

Year	DOI	Venue
2019	10.1109/ICCV.2019.00038	2019 IEEE/CVF International Conference on Computer Vision (ICCV)
Keywords	Field	DocType
neural networks,mixed-precision quantization,deep networks,block-wise fine-tuning order,second-order quantization method,Hessian spectrum,deterministic fine-tuning order,SqueezeNext models,factorial complexity,Hessian aware quantization,HAWQ	Mixed precision,Exponential function,Pattern recognition,Computer science,Inference,Hessian matrix,Factorial,Algorithm,Compression ratio,Artificial intelligence,Artificial neural network,Quantization (signal processing)	Journal
Volume	Issue	ISSN
abs/1905.03696	1	1550-5499
ISBN	Citations	PageRank
978-1-7281-4804-5	12	0.61
References	Authors
3	5

Authors (5 rows)

Cited by (12 rows)

References (3 rows)

Name	Order	Citations	PageRank
Z. Dong	1	24	4.86
Zhewei Yao	2	31	10.58
Amir Gholami	3	66	12.99
Michael W. Mahoney	4	3297	218.10
Kurt Keutzer	5	5040	801.67

1