Abstract | ||
---|---|---|
Approximate bi-level optimization (ABLO) consists of (outer-level) optimization problems, involving numerical (inner-level) optimization loops. While ABLO has many applications across deep learning, it suffers from time and memory complexity proportional to the length r of its inner optimization loop. To address this complexity, an earlier first-order method (FOM) was proposed as a heuristic that omits second derivative terms, yielding significant speed gains and requiring only constant memory. Despite FOM's popularity, there is a lack of theoretical understanding of its convergence properties. We contribute by theoretically characterizing FOM's gradient bias under mild assumptions. We further demonstrate a rich family of examples where FOM-based SGD does not converge to a stationary point of the ABLO objective. We address this concern by proposing an unbiased FOM (UFOM) enjoying constant memory complexity as a function of r. We characterize the introduced time-variance tradeoff, demonstrate convergence bounds, and find an optimal UFOM for a given ABLO problem. Finally, we propose an efficient adaptive UFOM scheme. |
Year | Venue | DocType |
---|---|---|
2021 | INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139 | Conference |
Volume | ISSN | Citations |
139 | 2640-3498 | 0 |
PageRank | References | Authors |
0.34 | 0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Valerii Likhosherstov | 1 | 0 | 2.03 |
Xingyou Song | 2 | 5 | 6.13 |
Krzysztof Choromanski | 3 | 124 | 23.56 |
Davis, Jared Quincy | 4 | 0 | 0.68 |
Adrian Weller | 5 | 141 | 27.59 |