Performance Trade-offs in Weight Quantization for Memory-Efficient Inference - Citegraph

Paper Info

Title
Performance Trade-offs in Weight Quantization for Memory-Efficient Inference

Abstract
Over the past decade, Deep Neural Networks (DNNs) trained using Deep Learning (DL) frameworks have become the workhorse to solve a wide variety of computational tasks in big data environments. To date, DL DNNs have relied on large amounts of computational power to reach peak performance, typically relying on the high computational bandwidth of GPUs, while straining available memory bandwidth and capacity. With ever increasing data complexity and more stringent energy constraints in Internet-of-Things (IoT) application environments, there has been a growing interest in the development of more efficient DNN inference methods that economize on random-access memory usage in weight access. Herein, we present a systematic analysis of the performance trade-offs of quantized weight representations at variable bit length for memory-efficient inference in pre-trained DNN models. In this work, we vary the mantissa and exponent bit lengths in the representation of the network parameters and examine the effect of DropOut regularization during pre-training and the impact of two different weight truncation mechanisms: stochastic and deterministic rounding. We show drastic reduction in the memory need, down to 4 bits per weight, while maintaining near-optimal test performance of low-complexity DNNs pre-trained on the MNIST and CIFAR-10 datasets. These results offer a simple methodology to achieve high memory and computation efficiency of inference in DNN dedicated low-power hardware for IoT, directly from pre-trained, high-resolution DNNs using standard DL algorithms.

Year	DOI	Venue
2019	10.1109/AICAS.2019.8771473	2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)
Keywords	Field	DocType
Quantized neural networks,stochastic rounding,low-precision,energy efficient,floating-point precision	MNIST database,Memory bandwidth,Floating point,High memory,Inference,Computer science,Bandwidth (signal processing),Artificial intelligence,Deep learning,Quantization (signal processing),Computer engineering	Conference
ISBN	Citations	PageRank
978-1-5386-7885-5	0	0.34
References	Authors
0	3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Pablo M. Tostado	1	0	0.34
Bruno Pedroni	2	86	8.46
Gert Cauwenberghs	3	1262	167.20

1