Abstract | ||
---|---|---|
The explosive expansion of Deep Neural Networks (DNN) model size expedites the need for larger memory capacity. This movement is particularly true for models in natural language processing (NLP), a dominant application of AI along with computer vision. For example, a recent extreme-scale language model GPT-3 from OpenAI has over 175 billion parameters. Furthermore, such a model mostly consists of FC layers with huge dimensions, and thus has a relatively high arithmetic intensity. In that sense, an extreme-scale language model does not suit well to the conventional HBM DRAM-based memory system that lacks capacity and offers extremely high bandwidth. For this reason, we propose to pair the neural network training accelerator with the flash-based memory system instead of the HBM DRAM-based memory system. To design the effective flash-based memory system, we optimize the existing SSD design to improve the SSD bandwidth as well as endurance. Finally, we evaluate our proposed platform, and show that Behemoth achieves 3.65 x cost saving over TPU v3 and 2.05 x training throughput improvement over the accelerator attached to a commercial SSD. |
Year | Venue | DocType |
---|---|---|
2021 | PROCEEDINGS OF THE 19TH USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES (FAST '21) | Conference |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Shine Kim | 1 | 0 | 1.01 |
Yunho Jin | 2 | 1 | 1.71 |
Gina Sohn | 3 | 0 | 0.34 |
Jonghyun Bae | 4 | 2 | 3.07 |
Tae Jun Ham | 5 | 4 | 3.76 |
Jae W. Lee | 6 | 607 | 52.37 |