Abstract | ||
---|---|---|
Learning to learn has emerged as an important direction for achieving artificial intelligence. Two of the primary barriers to its adoption are an inability to scale to larger problems and a limited ability to generalize to new tasks. We introduce a learned gradient descent optimizer that generalizes well to new tasks, and which has significantly reduced memory and computation overhead. We achieve this by introducing a novel hierarchical RNN architecture, with minimal per-parameter overhead, augmented with additional architectural features that mirror the known structure of optimization tasks. We also develop a meta-training ensemble of small, diverse optimization tasks capturing common properties of loss landscapes. The optimizer learns to outperform RMSProp/ADAM on problems in this corpus. More importantly, it performs comparably or better when applied to small convolutional neural networks, despite seeing no neural networks in its meta-training set. Finally, it generalizes to train Inception V3 and ResNet V2 architectures on the ImageNet dataset for thousands of steps, optimization problems that are of a vastly different scale than those it was trained on. We release an open source implementation of the meta-training algorithm. |
Year | Venue | DocType |
---|---|---|
2017 | ICML | Conference |
Volume | Citations | PageRank |
abs/1703.04813 | 16 | 0.80 |
References | Authors | |
10 | 7 |
Name | Order | Citations | PageRank |
---|---|---|---|
Olga Wichrowska | 1 | 16 | 0.80 |
Niru Maheswaranathan | 2 | 54 | 10.47 |
Matt Hoffman | 3 | 227 | 14.27 |
Sergio Gomez Colmenarejo | 4 | 42 | 2.43 |
Misha Denil | 5 | 397 | 26.18 |
Nando De Freitas | 6 | 3284 | 273.68 |
Jascha Sohl-Dickstein | 7 | 673 | 82.82 |