Automatic GPU memory management for large neural models in TensorFlow - Citegraph

Paper Info

Title
Automatic GPU memory management for large neural models in TensorFlow

Abstract
Deep learning models are becoming larger and will not fit in the limited memory of accelerators such as GPUs for training. Though many methods have been proposed to solve this problem, they are rather ad-hoc in nature and difficult to extend and integrate with other techniques. In this paper, we tackle the problem in a formal way to provide a strong foundation for supporting large models. We propose a method of formally rewriting the computational graph of a model where swap-out and swap-in operations are inserted to temporarily store intermediate results on CPU memory. By introducing a categorized topological ordering for simulating graph execution, the memory consumption of a model can be easily analyzed by using operation distances in the ordering. As a result, the problem of fitting a large model into a memory-limited accelerator is reduced to the problem of reducing operation distances in a categorized topological ordering. We then show how to formally derive swap-out and swap-in operations from an existing graph and present rules to optimize the graph. Finally, we propose a simulation-based auto-tuning to automatically find suitable graph-rewriting parameters for the best performance. We developed a module in TensorFlow, called LMS, by which we successfully trained ResNet-50 with a 4.9x larger mini-batch size and 3D U-Net with a 5.6x larger image resolution.

Year	DOI	Venue
2019	10.1145/3315573.3329984	Proceedings of the 2019 ACM SIGPLAN International Symposium on Memory Management
Keywords	Field	DocType
GPU memory, computational graphs, large neural models, simulator	Graph,Computer science,Topological sorting,Parallel computing,Memory management,Artificial intelligence,Rewriting,Deep learning,Image resolution	Conference
ISBN	Citations	PageRank
978-1-4503-6722-6	0	0.34
References	Authors
0	4

Authors (4 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Tung D. Le	1	2	2.08
haruki imai	2	0	1.35
Yasushi Negishi	3	36	6.36
Kawachiya, K.	4	145	16.81

1