Zero-Offload: Democratizing Billion-Scale Model Training - Citegraph

Paper Info

Title
Zero-Offload: Democratizing Billion-Scale Model Training

Abstract
Large-scale model training has been a playing ground for a limited few users, because it often requires complex model refactoring and access to prohibitively expensive GPU clusters. ZeRO-Offload changes the large model training landscape by making large model training accessible to nearly everyone. It can train models with over 13 billion parameters on a single GPU, a 10x increase in size compared to popular framework such as PyTorch, and it does so without requiring any model change from data scientists or sacrificing computational efficiency.ZeRO-Offload enables large model training by offloading data and compute to CPU. To preserve compute efficiency, it is designed to minimize data movement to/from GPU, and reduce CPU compute time while maximizing memory savings on GPU. As a result, ZeRO-Offload can achieve 40 TFlops/GPU on a single NVIDIA V100 GPU for 10B parameter model, compared to 30TF using PyTorch alone for a 1.4B parameter model, the largest that can be trained without running out of memory on GPU. ZeRO-Offload is also designed to scale on multiple-GPUs when available, offering near-linear speedup on up to 128 GPUs. Additionally, it can work together with model parallelism to train models with over 70 billion parameters on a single DGX-2 box, a 4.5x increase in model size compared to using model parallelism alone.By combining compute and memory efficiency with ease-of-use, ZeRO-Offload democratizes large-scale model training making it accessible to even data scientists with access to just a single GPU.

Year	Venue	DocType
2021	PROCEEDINGS OF THE 2021 USENIX ANNUAL TECHNICAL CONFERENCE	Conference
Citations	PageRank	References
0	0.34	0
Authors
8

Authors (8 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Jie Ren	1	51	17.62
Samyam Rajbhandari	2	23	3.79
Reza Yazdani Aminabadi	3	0	0.68
Olatunji Ruwase	4	167	14.40
Shuangyan Yang	5	0	0.34
Minjia Zhang	6	2	4.08
Li, Dong	7	764	48.56
Yuxiong He	8	666	40.52

1