Abstract | ||
---|---|---|
Incorporating second-order curvature information into machine learning optimization algorithms can be subtle, and doing so naively can lead to high per-iteration costs associated with forming the Hessian and performing the associated linear system solve. To address this, we introduce ADAHESSIAN, a new stochastic optimization algorithm. ADAHESSIAN directly incorporates approximate curvature information from the loss function, and it includes several novel performance-improving features, including: (i) a fast Hutchinson based method to approximate the curvature matrix with low computational overhead; (ii) a spatial averaging to reduce the variance of the second derivative; and (iii) a root-mean-square exponential moving average to smooth out variations of the second-derivative across different iterations. We perform extensive tests on NLP, CV, and recommendation system tasks, and ADAHESSIAN achieves state-of-the-art results. In particular, we find that ADAHESSIAN: (i) outperforms AdamW for transformers by 0.13/0.33 BLEU score on IWSLT14/WMT14, 2.7/1.0 PPL on PTB/Wikitext-103; (ii) outperforms AdamW for Squeeze-Bert by 0.41 points on GLUE; (iii) achieves 1.45%/5.55% higher accuracy on ResNet32/ResNet18 on Cifar10/ImageNet as compared to Adam; and (iv) achieves 0.032% better score than Adagrad for DLRM on the Criteo Ad Kaggle dataset. The cost per iteration of ADAHESSIAN is comparable to first-order methods, and ADAHESSIAN exhibits improved robustness towards variations in hyperparameter values. The code for ADAHESSIAN is open-sourced and publicly-available (Yao and Gholami 2020). |
Year | Venue | DocType |
---|---|---|
2021 | THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | Conference |
Volume | ISSN | Citations |
35 | 2159-5399 | 0 |
PageRank | References | Authors |
0.34 | 0 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Zhewei Yao | 1 | 31 | 10.58 |
Amir Gholami | 2 | 66 | 12.99 |
Sheng Shen | 3 | 10 | 5.65 |
Mustafa Mustafa | 4 | 4 | 2.15 |
Kurt Keutzer | 5 | 5040 | 801.67 |
Michael W. Mahoney | 6 | 3297 | 218.10 |