Title
Optimizing Reconfigurable Recurrent Neural Networks
Abstract
This paper proposes a novel latency-hiding hardware architecture based on column-wise matrix-vector multiplication to eliminate data dependency, improving the throughput of systems of RNN models. In addition, a flexible checkerboard tiling strategy is introduced to allow large weight matrices, while supporting element-based parallelism and vector-based parallelism. These optimizations improve the exploitation of the available parallelism to increase run-time hardware utilization and boost inference throughput. Furthermore, a quantization scheme with fine-tuning is proposed to achieve high accuracy. Evaluation results show that the proposed architecture can enhance performance and energy efficiency with little accuracy loss. It achieves 1.05 to 3.35 times better performance and 1.22 to 3.92 times better hardware utilization than a state-of-theart FPGA-based LSTM design, which shows that our approach contributes to high performance FPGA-based LSTM systems.
Year
DOI
Venue
2020
10.1109/FCCM48280.2020.00011
2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)
Keywords
DocType
ISSN
reconfigurable recurrent neural networks,latency-hiding hardware architecture,column-wise matrix-vector multiplication,data dependency,RNN models,weight matrices,element-based parallelism,vector-based parallelism,optimizations,run-time hardware utilization,boost inference throughput,energy efficiency,state-of-theart FPGA-based LSTM design,high performance FPGA-based LSTM systems,flexible checkerboard tiling
Conference
2576-2613
ISBN
Citations 
PageRank 
978-1-7281-5804-4
1
0.39
References 
Authors
7
8
Name
Order
Citations
PageRank
Zhiqiang Que1269.81
Hiroki Nakahara215537.34
Eriko Nurvitadhi339933.08
Hongxiang Fan4237.57
Chenglong Zeng571.88
Jiuxi Meng652.14
Xinyu Niu713523.16
Wayne Luk83752438.09