Abstract | ||
---|---|---|
Model size and inference speed/power have become a major challenge in the deployment of neural networks for many applications. A promising approach to address these problems is quantization. However, uniformly quantizing a model to ultra-low precision leads to significant accuracy degradation. A novel solution for this is to use mixed-precision quantization, as some parts of the network may allow lower precision as compared to other layers. However, there is no systematic way to determine the precision of different layers. A brute force approach is not feasible for deep networks, as the search space for mixed-precision is exponential in the number of layers. Another challenge is a similar factorial complexity for determining block-wise fine-tuning order when quantizing the model to a target precision. Here, we introduce Hessian AWare Quantization (HAWQ), a novel second-order quantization method to address these problems. HAWQ allows for the automatic selection of the relative quantization precision of each layer, based on the layer's Hessian spectrum. Moreover, HAWQ provides a deterministic fine-tuning order for quantizing layers. We show the results of our method on Cifar-10 using ResNet20, and on ImageNet using Inception-V3, ResNet50 and SqueezeNext models. Comparing HAWQ with state-of-the-art shows that we can achieve similar/better accuracy with 8× activation compression ratio on ResNet20, as compared to DNAS, and up to 1% higher accuracy with up to 14% smaller models on ResNet50 and Inception-V3, compared to recently proposed methods of RVQuant and HAQ. Furthermore, we show that we can quantize SqueezeNext to just 1MB model size while achieving above 68% top1 accuracy on ImageNet. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1109/ICCV.2019.00038 | 2019 IEEE/CVF International Conference on Computer Vision (ICCV) |
Keywords | Field | DocType |
neural networks,mixed-precision quantization,deep networks,block-wise fine-tuning order,second-order quantization method,Hessian spectrum,deterministic fine-tuning order,SqueezeNext models,factorial complexity,Hessian aware quantization,HAWQ | Mixed precision,Exponential function,Pattern recognition,Computer science,Inference,Hessian matrix,Factorial,Algorithm,Compression ratio,Artificial intelligence,Artificial neural network,Quantization (signal processing) | Journal |
Volume | Issue | ISSN |
abs/1905.03696 | 1 | 1550-5499 |
ISBN | Citations | PageRank |
978-1-7281-4804-5 | 12 | 0.61 |
References | Authors | |
3 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Z. Dong | 1 | 24 | 4.86 |
Zhewei Yao | 2 | 31 | 10.58 |
Amir Gholami | 3 | 66 | 12.99 |
Michael W. Mahoney | 4 | 3297 | 218.10 |
Kurt Keutzer | 5 | 5040 | 801.67 |