High-Performance CNN Accelerator on FPGA Using Unified Winograd-GEMM Architecture - Citegraph

Paper Info

Title
High-Performance CNN Accelerator on FPGA Using Unified Winograd-GEMM Architecture

Abstract
Deep neural networks have revolutionized a variety of applications in varying domains like autonomous vehicles, weather forecasting, cancer detection, surveillance, traffic management, and so on. The convolutional neural network (CNN) is the state-of-the-art technique for many machine learning tasks in the image and video processing domains. Deployment of CNNs on embedded systems with lower processing power and smaller power budget is a challenging task. Recent studies have shown the effectiveness of field-programmable gate array (FPGA) as a hardware accelerator for the CNNs that can deliver high performance at low power budgets. Majority of computations in CNNs involve 2-D convolution. Winograd minimal filtering-based algorithm is the most efficient technique for calculating convolution for smaller filter sizes. CNNs also consist of fully connected layers that are computed using general element-wise matrix multiplication (GEMM). In this article, we propose a unified architecture named UniWiG, where both Winograd-based convolution and GEMM can be accelerated using the same set of processing elements. This approach leads to efficient utilization of FPGA hardware resources while computing all layers in the CNN. The proposed architecture shows performance improvement in the range of <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$1.4\times $ </tex-math></inline-formula> to <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$4.02\times $ </tex-math></inline-formula> with only 13% additional FPGA resources with respect to the baseline GEMM-based architecture. We have mapped popular CNN models like AlexNet and VGG-16 onto the proposed accelerator and the measured performance compares favorably with other state-of-the-art implementations. We have also analyzed the vulnerability of the accelerator to the side-channel attacks. Preliminary investigations show that the UniWiG architecture is more robust to memory side-channel attacks than direct convolution-based techniques.

Year	DOI	Venue
2019	10.1109/TVLSI.2019.2941250	IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Keywords	Field	DocType
Convolution,Computer architecture,Field programmable gate arrays,Hardware,Computational modeling,Side-channel attacks,Parallel processing	Architecture,Computer architecture,Computer science,Field-programmable gate array,Real-time computing	Journal
Volume	Issue	ISSN
27	12	1063-8210
Citations	PageRank	References
6	0.49	0
Authors
4

Authors (4 rows)

Cited by (6 rows)

References (0 rows)

Name	Order	Citations	PageRank
Srikant Manas Kala	1	41	10.27
Babita R. Jose	2	14	7.96
Jimson Mathew	3	230	55.44
Nalesh, S.	4	9	2.36

1