Abstract | ||
---|---|---|
Large-scale model training has been a playing ground for a limited few users, because it often requires complex model refactoring and access to prohibitively expensive GPU clusters. ZeRO-Offload changes the large model training landscape by making large model training accessible to nearly everyone. It can train models with over 13 billion parameters on a single GPU, a 10x increase in size compared to popular framework such as PyTorch, and it does so without requiring any model change from data scientists or sacrificing computational efficiency.ZeRO-Offload enables large model training by offloading data and compute to CPU. To preserve compute efficiency, it is designed to minimize data movement to/from GPU, and reduce CPU compute time while maximizing memory savings on GPU. As a result, ZeRO-Offload can achieve 40 TFlops/GPU on a single NVIDIA V100 GPU for 10B parameter model, compared to 30TF using PyTorch alone for a 1.4B parameter model, the largest that can be trained without running out of memory on GPU. ZeRO-Offload is also designed to scale on multiple-GPUs when available, offering near-linear speedup on up to 128 GPUs. Additionally, it can work together with model parallelism to train models with over 70 billion parameters on a single DGX-2 box, a 4.5x increase in model size compared to using model parallelism alone.By combining compute and memory efficiency with ease-of-use, ZeRO-Offload democratizes large-scale model training making it accessible to even data scientists with access to just a single GPU. |
Year | Venue | DocType |
---|---|---|
2021 | PROCEEDINGS OF THE 2021 USENIX ANNUAL TECHNICAL CONFERENCE | Conference |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
8 |
Name | Order | Citations | PageRank |
---|---|---|---|
Jie Ren | 1 | 51 | 17.62 |
Samyam Rajbhandari | 2 | 23 | 3.79 |
Reza Yazdani Aminabadi | 3 | 0 | 0.68 |
Olatunji Ruwase | 4 | 167 | 14.40 |
Shuangyan Yang | 5 | 0 | 0.34 |
Minjia Zhang | 6 | 2 | 4.08 |
Li, Dong | 7 | 764 | 48.56 |
Yuxiong He | 8 | 666 | 40.52 |