Title
BaGuaLu: targeting brain scale pretrained models with over 37 million cores
Abstract
ABSTRACTLarge-scale pretrained AI models have shown state-of-the-art accuracy in a series of important applications. As the size of pretrained AI models grows dramatically each year in an effort to achieve higher accuracy, training such models requires massive computing and memory capabilities, which accelerates the convergence of AI and HPC. However, there are still gaps in deploying AI applications on HPC systems, which need application and system co-design based on specific hardware features. To this end, this paper proposes BaGuaLu1, the first work targeting training brain scale models on an entire exascale supercomputer, the New Generation Sunway Supercomputer. By combining hardware-specific intra-node optimization and hybrid parallel strategies, BaGuaLu enables decent performance and scalability on unprecedentedly large models. The evaluation shows that BaGuaLu can train 14.5-trillion-parameter models with a performance of over 1 EFLOPS using mixed-precision and has the capability to train 174-trillion-parameter models, which rivals the number of synapses in a human brain.
Year
DOI
Venue
2022
10.1145/3503221.3508417
Principles and Practice of Parallel Programming
DocType
Citations 
PageRank 
Conference
0
0.34
References 
Authors
0
25
Name
Order
Citations
PageRank
Zixuan Ma101.01
Jiaao He221.46
Jiezhong Qiu326812.48
Huanqi Cao401.35
Yuanwei Wang500.34
Zhenbo Sun601.01
Liyan Zheng701.35
Haojie Wang823.75
Shizhi Tang901.35
Tianyu Zheng1000.34
Junyang Lin1100.34
Guanyu Feng1201.35
Zeqiang Huang1300.34
Jie Gao1400.34
Aohan Zeng1500.34
Jianwei Zhang1600.34
Runxin Zhong1700.68
Tianhui Shi1800.68
Sha Liu1900.34
Weimin Zheng201889182.48
Jie Tang215871300.22
Hongxia Yang2227135.55
Xin Liu2300.34
Jidong Zhai2434036.27
Wenguang Chen25101470.57