Title
HAWQ: Hessian AWare Quantization of Neural Networks With Mixed-Precision
Abstract
Model size and inference speed/power have become a major challenge in the deployment of neural networks for many applications. A promising approach to address these problems is quantization. However, uniformly quantizing a model to ultra-low precision leads to significant accuracy degradation. A novel solution for this is to use mixed-precision quantization, as some parts of the network may allow lower precision as compared to other layers. However, there is no systematic way to determine the precision of different layers. A brute force approach is not feasible for deep networks, as the search space for mixed-precision is exponential in the number of layers. Another challenge is a similar factorial complexity for determining block-wise fine-tuning order when quantizing the model to a target precision. Here, we introduce Hessian AWare Quantization (HAWQ), a novel second-order quantization method to address these problems. HAWQ allows for the automatic selection of the relative quantization precision of each layer, based on the layer's Hessian spectrum. Moreover, HAWQ provides a deterministic fine-tuning order for quantizing layers. We show the results of our method on Cifar-10 using ResNet20, and on ImageNet using Inception-V3, ResNet50 and SqueezeNext models. Comparing HAWQ with state-of-the-art shows that we can achieve similar/better accuracy with 8× activation compression ratio on ResNet20, as compared to DNAS, and up to 1% higher accuracy with up to 14% smaller models on ResNet50 and Inception-V3, compared to recently proposed methods of RVQuant and HAQ. Furthermore, we show that we can quantize SqueezeNext to just 1MB model size while achieving above 68% top1 accuracy on ImageNet.
Year
DOI
Venue
2019
10.1109/ICCV.2019.00038
2019 IEEE/CVF International Conference on Computer Vision (ICCV)
Keywords
Field
DocType
neural networks,mixed-precision quantization,deep networks,block-wise fine-tuning order,second-order quantization method,Hessian spectrum,deterministic fine-tuning order,SqueezeNext models,factorial complexity,Hessian aware quantization,HAWQ
Mixed precision,Exponential function,Pattern recognition,Computer science,Inference,Hessian matrix,Factorial,Algorithm,Compression ratio,Artificial intelligence,Artificial neural network,Quantization (signal processing)
Journal
Volume
Issue
ISSN
abs/1905.03696
1
1550-5499
ISBN
Citations 
PageRank 
978-1-7281-4804-5
12
0.61
References 
Authors
3
5
Name
Order
Citations
PageRank
Z. Dong1244.86
Zhewei Yao23110.58
Amir Gholami36612.99
Michael W. Mahoney43297218.10
Kurt Keutzer55040801.67