Title
Accelerating recurrent neural networks in analytics servers: Comparison of FPGA, CPU, GPU, and ASIC
Abstract
Recurrent neural networks (RNNs) provide state-of-the-art accuracy for performing analytics on datasets with sequence (e.g., language model). This paper studied a state-of-the-art RNN variant, Gated Recurrent Unit (GRU). We first proposed memoization optimization to avoid 3 out of the 6 dense matrix vector multiplications (SGEMVs) that are the majority of the computation in GRU. Then, we study the opportunities to accelerate the remaining SGEMVs using FPGAs, in comparison to 14-nm ASIC, GPU, and multi-core CPU. Results show that FPGA provides superior performance/Watt over CPU and GPU because FPGA's on-chip BRAMs, hard DSPs, and reconfigurable fabric allow for efficiently extracting fine-grained parallelisms from small/medium size matrices used by GRU. Moreover, newer FPGAs with more DSPs, on-chip BRAMs, and higher frequency have the potential to narrow the FPGA-ASIC efficiency gap.
Year
DOI
Venue
2016
10.1109/FPL.2016.7577314
2016 26th International Conference on Field Programmable Logic and Applications (FPL)
Keywords
Field
DocType
recurrent neural networks,analytics servers,FPGA,GPU,ASIC,RNN,gated recurrent unit,GRU,memoization optimization,dense matrix vector multiplications,SGEMV,multicore CPU,on-chip BRAM,DSP,reconfigurable fabric,fine-grained parallelisms,field programmable gate array,graphics processing unit
Logic gate,Computer science,Server,Parallel computing,Field-programmable gate array,Recurrent neural network,Application-specific integrated circuit,Real-time computing,Analytics,Memoization,Sparse matrix
Conference
ISSN
ISBN
Citations 
1946-1488
978-1-5090-0851-3
21
PageRank 
References 
Authors
1.52
9
6
Name
Order
Citations
PageRank
Eriko Nurvitadhi139933.08
Jaewoong Sim238417.25
David Sheffield3333.54
Asit K. Mishra4121646.21
Srivatsan Krishnan5966.86
Debbie Marr617512.39