Title
Performance Trade-offs in Weight Quantization for Memory-Efficient Inference
Abstract
Over the past decade, Deep Neural Networks (DNNs) trained using Deep Learning (DL) frameworks have become the workhorse to solve a wide variety of computational tasks in big data environments. To date, DL DNNs have relied on large amounts of computational power to reach peak performance, typically relying on the high computational bandwidth of GPUs, while straining available memory bandwidth and capacity. With ever increasing data complexity and more stringent energy constraints in Internet-of-Things (IoT) application environments, there has been a growing interest in the development of more efficient DNN inference methods that economize on random-access memory usage in weight access. Herein, we present a systematic analysis of the performance trade-offs of quantized weight representations at variable bit length for memory-efficient inference in pre-trained DNN models. In this work, we vary the mantissa and exponent bit lengths in the representation of the network parameters and examine the effect of DropOut regularization during pre-training and the impact of two different weight truncation mechanisms: stochastic and deterministic rounding. We show drastic reduction in the memory need, down to 4 bits per weight, while maintaining near-optimal test performance of low-complexity DNNs pre-trained on the MNIST and CIFAR-10 datasets. These results offer a simple methodology to achieve high memory and computation efficiency of inference in DNN dedicated low-power hardware for IoT, directly from pre-trained, high-resolution DNNs using standard DL algorithms.
Year
DOI
Venue
2019
10.1109/AICAS.2019.8771473
2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)
Keywords
Field
DocType
Quantized neural networks,stochastic rounding,low-precision,energy efficient,floating-point precision
MNIST database,Memory bandwidth,Floating point,High memory,Inference,Computer science,Bandwidth (signal processing),Artificial intelligence,Deep learning,Quantization (signal processing),Computer engineering
Conference
ISBN
Citations 
PageRank 
978-1-5386-7885-5
0
0.34
References 
Authors
0
3
Name
Order
Citations
PageRank
Pablo M. Tostado100.34
Bruno Pedroni2868.46
Gert Cauwenberghs31262167.20