GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks. - Citegraph

Paper Info

Title
GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks.

Abstract
Deep multitask networks, in which one neural network produces multiple predictive outputs, are more scalable and often better regularized than their single-task counterparts. Such advantages can potentially lead to gains in both speed and performance, but multitask networks are also difficult to train without finding the right balance between tasks. We present a novel gradient normalization (GradNorm) technique which automatically balances the multitask loss function by directly tuning the gradients to equalize task training rates. We show that for various network architectures, for both regression and classification tasks, and on both synthetic and real datasets, GradNorm improves accuracy and reduces overfitting over single networks, static baselines, and other adaptive multitask loss balancing techniques. GradNorm also matches or surpasses the performance of exhaustive grid search methods, despite only involving a single asymmetry hyperparameter $alpha$. Thus, what was once a tedious search process which incurred exponentially more compute for each task added can now be accomplished within a few training runs, irrespective of the number of tasks. Ultimately, we hope to demonstrate that gradient manipulation affords us great control over the training dynamics of multitask networks and may be one of the keys to unlocking the potential of multitask learning.

Year	Venue	DocType
2018	international conference on machine learning	Conference
Volume	ISSN	Citations
abs/1711.02257	Proceedings of the 35th International Conference on Machine Learning (2018), 793-802	13
PageRank	References	Authors
0.64	17	4

Authors (4 rows)

Cited by (13 rows)

References (17 rows)

Name	Order	Citations	PageRank
Zhao Chen	1	76	25.75
Vijay Badrinarayanan	2	1445	58.59
Chen-Yu Lee	3	138	7.27
Andrew Rabinovich	4	796	57.29

1