Abstract | ||
---|---|---|
With the yearning for deep learning democratization, there are increasing demands to implement Transformer-based natural language processing (NLP) models on resource-constrained devices for low-latency and high accuracy. Existing BERT pruning methods require domain experts to heuristically handcraft hyperparameters to strike a balance among model size, latency, and accuracy. In this work, we propose AE-BERT, an automatic and efficient BERT pruning framework with efficient evaluation to select a "good" sub-network candidate (with high accuracy) given the overall pruning ratio constraints. Our proposed method requires no human experts experience and achieves a better accuracy performance on many NLP tasks. Our experimental results on General Language Understanding Evaluation (GLUE) benchmark show that AE-BERT outperforms the state-of-the-art (SOTA) hand-crafted pruning methods on BERT. On QNLI and RTE, we obtain 75% and 42.8% more overall pruning ratio while achieving higher accuracy. On MRPC, we obtain a 4.6 higher score than the SOTA at the same overall pruning ratio of 0.5. On STS-B, we can achieve a 40% higher pruning ratio with a very small loss in Spearman correlation compared to SOTA hand-crafted pruning methods. Experimental results also show that after model compression, the inference time of a single BERT
<inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">BASE</inf>
encoder on Xilinx Alveo U200 FPGA board has a 1.83× speedup compared to Intel(R) Xeon(R) Gold 5218 (2.30GHz) CPU, which shows the reasonableness of deploying the proposed method generated sub-networks of BERT
<inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">BASE</inf>
model on computation restricted devices. |
Year | DOI | Venue |
---|---|---|
2022 | 10.1109/ISQED54688.2022.9806197 | 2022 23rd International Symposium on Quality Electronic Design (ISQED) |
Keywords | DocType | ISSN |
Transformer,deep learning,pruning,acceleration | Conference | 1948-3287 |
ISBN | Citations | PageRank |
978-1-6654-9467-0 | 0 | 0.34 |
References | Authors | |
8 | 8 |
Name | Order | Citations | PageRank |
---|---|---|---|
Shaoyi Huang | 1 | 2 | 2.44 |
Ning Liu | 2 | 2 | 0.70 |
Yueying Liang | 3 | 0 | 0.34 |
Hongwu Peng | 4 | 2 | 1.76 |
Hongjia Li | 5 | 7 | 5.91 |
Dongkuan Xu | 6 | 0 | 0.68 |
Mimi Xie | 7 | 8 | 1.88 |
Caiwen Ding | 8 | 142 | 26.52 |