Title
Hardware Acceleration of Fully Quantized BERT for Efficient Natural Language Processing
Abstract
BERT is the most recent Transformer-based model that achieves state-of-the-art performance in various NLP tasks. In this paper, we investigate the hardware acceleration of BERT on FPGA for edge computing. To tackle the issue of huge computational complexity and memory footprint, we propose to fully quantize the BERT (FQ-BERT), including weights, activations, softmax, layer normalization, and all t...
Year
DOI
Venue
2021
10.23919/DATE51398.2021.9474043
2021 Design, Automation & Test in Europe Conference & Exhibition (DATE)
Keywords
DocType
ISSN
Computational modeling,Bit error rate,Graphics processing units,Natural language processing,Energy efficiency,Task analysis,Computational complexity
Conference
Design, Automation & Test in Europe (DATE) 2021
ISBN
Citations 
PageRank 
978-3-9819263-5-4
1
0.43
References 
Authors
0
3
Name
Order
Citations
PageRank
Zejian Liu121.14
Gang Li211.45
Jian Cheng31327115.72