DeepFreeze: Towards Scalable Asynchronous Checkpointing of Deep Learning Models - Citegraph

Paper Info

Title
DeepFreeze: Towards Scalable Asynchronous Checkpointing of Deep Learning Models

Abstract
In the age of big data, deep learning has emerged as a powerful tool to extract insight and exploit its value, both in industry and scientific applications. One common pattern emerging in such applications is frequent checkpointing of the state of the learning model during training, needed in a variety of scenarios: analysis of intermediate states to explain features and correlations with training data, exploration strategies involving alternative models that share a common ancestor, knowledge transfer, resilience, etc. However, with increasing size of the learning models and popularity of distributed data-parallel training approaches, simple checkpointing techniques used so far face several limitations: low serialization performance, blocking I/O, stragglers due to the fact that only a single process is involved in checkpointing. This paper proposes a checkpointing technique specifically designed to address the aforementioned limitations, introducing efficient asynchronous techniques to hide the overhead of serialization and I/O, and distribute the load over all participating processes. Experiments with two deep learning applications (CANDLE and ResNet) on a pre-Exascale HPC platform (Theta) shows significant improvement over state-of-art, both in terms of checkpointing duration and runtime overhead.

Year	DOI	Venue
2020	10.1109/CCGrid49817.2020.00-76	2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID)
Keywords	DocType	ISBN
checkpointing,deep learning,fine-grain asynchronous I/O,multi-level data persistence	Conference	978-1-7281-6095-5
Citations	PageRank	References
3	0.40	0
Authors
6

Authors (6 rows)

Cited by (3 rows)

References (0 rows)

Name	Order	Citations	PageRank
Bogdan Nicolae	1	392	29.51
Jiali Li	2	3	0.40
Justin M. Wozniak	3	464	35.32
George Bosilca	4	1916	140.48
matthieu dorier	5	131	13.91
Franck Cappello	6	3775	251.47

1