Title
Implementing Convolutional Neural Networks Using Hartley Stochastic Computing With Adaptive Rate Feature Map Compression
Abstract
Energy consumption and the latency of convolutional neural networks (CNNs) are two important factors that limit their applications specifically for embedded devices. Fourier-based frequency domain (FD) convolution is a promising low-cost alternative to conventional implementations in the spatial domain (SD) for CNNs. FD convolution performs its operation with point-wise multiplications. However, in CNNs, the overhead for the Fourier-based FD-convolution surpasses its computational saving for small filter sizes. In this work, we propose to implement convolutional layers in the FD using the Hartley transform (HT) instead of the Fourier transformation. We show that the HT can reduce the convolution delay and energy consumption even for small filters. With the HT of parameters, we replace convolution with point-wise multiplications. HT lets us compress input feature maps, in convolutional layers, before convolving them with filters. In this regard, we introduce two compression techniques: fixed-rate and adaptive-rate. In the fixed-rate compression, we select frequency domain input feature map (IFMap) coefficients with a constant pattern over all convolutional layers. However, for the adaptive-rate IFMap compression, the network, itself, learns to keep or discard coefficients, during training. Also, to optimize the hardware implementation of our methods (fixed- and adaptive-rate compressions), we utilize stochastic computing (SC) to perform the point-wise multiplications in the FD. In this regard, we re-formalize the HT to better match with SC. We show that, compared to conventional Fourier-based convolution, Hartley SC-based convolution can achieve <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$1.33\times$ </tex-math></inline-formula> speedup, and energy is reduced by 23% on a Virtex 7 FPGA when we implement AlexNet over CIFAR-10 based on the fixed-rate compression. Also, we show that if we utilize the adaptive-rate compression, we receive 16% and 15% latency improvement and energy consumption reduction, respectively, compared to the fixed-rate method.
Year
DOI
Venue
2021
10.1109/OJCAS.2021.3123899
IEEE Open Journal of Circuits and Systems
Keywords
DocType
Volume
Deep neural networks,frequency domain transformation,hardware implementation,energy optimization,latency improvement,FPGA
Journal
2
ISSN
Citations 
PageRank 
2644-1225
0
0.34
References 
Authors
0
4
Name
Order
Citations
PageRank
S. H. Mozafari100.34
James J. Clark240286.34
Warren J. Gross31106113.38
B. H. Meyer400.34