Abstract | ||
---|---|---|
Regularized nonlinear acceleration (RNA) is a generic extrapolation scheme for optimization methods, with marginal computational overhead. It aims to improve convergence using only the iterates of simple iterative algorithms. However, so far its application to optimization was theoretically limited to gradient descent and other single-step algorithms. Here, we adapt RNA to a much broader setting including stochastic gradient with momentum and Nesterovu0027s fast gradient. We use it to train deep neural networks, and empirically observe that extrapolated networks are more accurate, especially in the early iterations. A straightforward application of our algorithm when training ResNet-152 on ImageNet produces a top-1 test error of 20.88%, improving by 0.8% the reference classification pipeline. Furthermore, the code runs offline in this case, so it never negatively affects performance. |
Year | Venue | Field |
---|---|---|
2018 | arXiv: Optimization and Control | Convergence (routing),Overhead (computing),Mathematical optimization,Gradient descent,Nonlinear system,Algorithm,Extrapolation,Momentum,Acceleration,Iterated function,Mathematics |
DocType | Volume | Citations |
Journal | abs/1805.09639 | 1 |
PageRank | References | Authors |
0.41 | 5 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Damien Scieur | 1 | 1 | 1.09 |
Edouard Oyallon | 2 | 17 | 3.64 |
Alexandre D'aspremont | 3 | 991 | 101.61 |
Francis Bach | 4 | 11490 | 622.29 |