Title
BMInf: An Efficient Toolkit for Big Model Inference and Tuning
Abstract
In recent years, large-scale pre-trained language models (PLMs) containing billions of parameters have achieved promising results on various NLP tasks. Although we can pre-train these big models by stacking computing clusters at any cost, it is impractical to use such huge computing resources to apply big models for each downstream task. To address the computation bottleneck encountered in deploying big models in real-world scenarios, we introduce an open-source toolkit for Big Model Inference and tuning (BMInf), which can support big model inference and tuning at extremely low computation cost. More specifically, at the algorithm level, we introduce model quantization and parameter-efficient tuning for efficient model inference and tuning. At the implementation level, we apply model offloading, model checkpointing, and CPU-GPU scheduling optimization to further reduce the computation and memory cost of big models. Based on above efforts, we can efficiently perform big model inference and tuning with a single GPU (even a consumer-level GPU like GTX 1060) instead of computing clusters, which is difficult for existing distributed learning toolkits for PLMs. BMInf is publicly released at https://github.com/OpenBMB/BMInf.
Year
DOI
Venue
2022
10.18653/v1/2022.acl-demo.22
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): PROCEEDINGS OF SYSTEM DEMONSTRATIONS
DocType
Volume
Citations 
Conference
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations
0
PageRank 
References 
Authors
0.34
1
9
Name
Order
Citations
PageRank
Xu Han1154.94
Guoyang Zeng211.71
Weilin Zhao300.68
Zhiyuan Liu42037123.68
Zhengyan Zhang51058.78
Jie Zhou61311.09
Jun Zhang71102188.11
Jia Chao800.34
Maosong Sun92293162.86