Demystify Hyperparameters for Stochastic Optimization with Transferable Representations - Citegraph

Paper Info

Title
Demystify Hyperparameters for Stochastic Optimization with Transferable Representations

Abstract
This paper studies the convergence and generalization of a large class of Stochastic Gradient Descent (SGD) momentum schemes, in both learning from scratch and transferring representations with fine-tuning. Momentum-based acceleration of SGD is the default optimizer for many deep learning models. However, there is a lack of general convergence guarantees for many existing momentum variants in conjunction withstochastic gradient. It is also unclear how the momentum methods may affect thegeneralization error. In this paper, we give a unified analysis of several popular optimizers, e.g., Polyak's heavy ball momentum and Nesterov's accelerated gradient. Our contribution is threefold. First, we give a unified convergence guarantee for a large class of momentum variants in thestochastic setting. Notably, our results cover both convex and nonconvex objectives. Second, we prove a generalization bound for neural networks trained by momentum variants. We analyze how hyperparameters affect the generalization bound and consequently propose guidelines on how to tune these hyperparameters in various momentum schemes to generalize well. We provide extensive empirical evidence to our proposed guidelines. Third, this study fills the vacancy of a formal analysis of fine-tuning in literature. To our best knowledge, our work is the first systematic generalizability analysis on momentum methods that cover both learning from scratch and fine-tuning. Our codes are available https://github.com/jsycsjh/Demystify-Hyperparameters-for-Stochastic-Optimization-with-Transferable-Representations .

Year	DOI	Venue
2022	10.1145/3534678.3539298	KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
DocType	Citations	PageRank
Conference	0	0.34
References	Authors
4	4

Authors (4 rows)

Cited by (0 rows)

References (4 rows)

Name	Order	Citations	PageRank
Jianhui Sun	1	0	0.34
Mengdi Huai	2	29	10.02
Kishlay Jha	3	49	7.83
Aidong Zhang	4	2970	405.63

1