Title
GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks.
Abstract
Deep multitask networks, in which one neural network produces multiple predictive outputs, are more scalable and often better regularized than their single-task counterparts. Such advantages can potentially lead to gains in both speed and performance, but multitask networks are also difficult to train without finding the right balance between tasks. We present a novel gradient normalization (GradNorm) technique which automatically balances the multitask loss function by directly tuning the gradients to equalize task training rates. We show that for various network architectures, for both regression and classification tasks, and on both synthetic and real datasets, GradNorm improves accuracy and reduces overfitting over single networks, static baselines, and other adaptive multitask loss balancing techniques. GradNorm also matches or surpasses the performance of exhaustive grid search methods, despite only involving a single asymmetry hyperparameter $alpha$. Thus, what was once a tedious search process which incurred exponentially more compute for each task added can now be accomplished within a few training runs, irrespective of the number of tasks. Ultimately, we hope to demonstrate that gradient manipulation affords us great control over the training dynamics of multitask networks and may be one of the keys to unlocking the potential of multitask learning.
Year
Venue
DocType
2018
international conference on machine learning
Conference
Volume
ISSN
Citations 
abs/1711.02257
Proceedings of the 35th International Conference on Machine Learning (2018), 793-802
13
PageRank 
References 
Authors
0.64
17
4
Name
Order
Citations
PageRank
Zhao Chen17625.75
Vijay Badrinarayanan2144558.59
Chen-Yu Lee31387.27
Andrew Rabinovich479657.29