Title
High-Performance CNN Accelerator on FPGA Using Unified Winograd-GEMM Architecture
Abstract
Deep neural networks have revolutionized a variety of applications in varying domains like autonomous vehicles, weather forecasting, cancer detection, surveillance, traffic management, and so on. The convolutional neural network (CNN) is the state-of-the-art technique for many machine learning tasks in the image and video processing domains. Deployment of CNNs on embedded systems with lower processing power and smaller power budget is a challenging task. Recent studies have shown the effectiveness of field-programmable gate array (FPGA) as a hardware accelerator for the CNNs that can deliver high performance at low power budgets. Majority of computations in CNNs involve 2-D convolution. Winograd minimal filtering-based algorithm is the most efficient technique for calculating convolution for smaller filter sizes. CNNs also consist of fully connected layers that are computed using general element-wise matrix multiplication (GEMM). In this article, we propose a unified architecture named UniWiG, where both Winograd-based convolution and GEMM can be accelerated using the same set of processing elements. This approach leads to efficient utilization of FPGA hardware resources while computing all layers in the CNN. The proposed architecture shows performance improvement in the range of <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$1.4\times $ </tex-math></inline-formula> to <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$4.02\times $ </tex-math></inline-formula> with only 13% additional FPGA resources with respect to the baseline GEMM-based architecture. We have mapped popular CNN models like AlexNet and VGG-16 onto the proposed accelerator and the measured performance compares favorably with other state-of-the-art implementations. We have also analyzed the vulnerability of the accelerator to the side-channel attacks. Preliminary investigations show that the UniWiG architecture is more robust to memory side-channel attacks than direct convolution-based techniques.
Year
DOI
Venue
2019
10.1109/TVLSI.2019.2941250
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Keywords
Field
DocType
Convolution,Computer architecture,Field programmable gate arrays,Hardware,Computational modeling,Side-channel attacks,Parallel processing
Architecture,Computer architecture,Computer science,Field-programmable gate array,Real-time computing
Journal
Volume
Issue
ISSN
27
12
1063-8210
Citations 
PageRank 
References 
6
0.49
0
Authors
4
Name
Order
Citations
PageRank
Srikant Manas Kala14110.27
Babita R. Jose2147.96
Jimson Mathew323055.44
Nalesh, S.492.36