Title | ||
---|---|---|
Demystify Hyperparameters for Stochastic Optimization with Transferable Representations |
Abstract | ||
---|---|---|
This paper studies the convergence and generalization of a large class of Stochastic Gradient Descent (SGD) momentum schemes, in both learning from scratch and transferring representations with fine-tuning. Momentum-based acceleration of SGD is the default optimizer for many deep learning models. However, there is a lack of general convergence guarantees for many existing momentum variants in conjunction withstochastic gradient. It is also unclear how the momentum methods may affect thegeneralization error. In this paper, we give a unified analysis of several popular optimizers, e.g., Polyak's heavy ball momentum and Nesterov's accelerated gradient. Our contribution is threefold. First, we give a unified convergence guarantee for a large class of momentum variants in thestochastic setting. Notably, our results cover both convex and nonconvex objectives. Second, we prove a generalization bound for neural networks trained by momentum variants. We analyze how hyperparameters affect the generalization bound and consequently propose guidelines on how to tune these hyperparameters in various momentum schemes to generalize well. We provide extensive empirical evidence to our proposed guidelines. Third, this study fills the vacancy of a formal analysis of fine-tuning in literature. To our best knowledge, our work is the first systematic generalizability analysis on momentum methods that cover both learning from scratch and fine-tuning. Our codes are available https://github.com/jsycsjh/Demystify-Hyperparameters-for-Stochastic-Optimization-with-Transferable-Representations . |
Year | DOI | Venue |
---|---|---|
2022 | 10.1145/3534678.3539298 | KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining |
DocType | Citations | PageRank |
Conference | 0 | 0.34 |
References | Authors | |
4 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Jianhui Sun | 1 | 0 | 0.34 |
Mengdi Huai | 2 | 29 | 10.02 |
Kishlay Jha | 3 | 49 | 7.83 |
Aidong Zhang | 4 | 2970 | 405.63 |