arXiv 1711.02257

GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks

By Zhao Chen, Vijay Badrinarayanan, et al.

Published 2017-11-07

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

Deep multitask networks, in which one neural network produces multiple predictive outputs, can offer better speed and performance than their single-task counterparts but are challenging to train properly. We present a gradient normalization (GradNorm) algorithm that automatically balances training in deep multitask models by dynamically tuning gradient magnitudes. We show that for various network architectures, for…

View the original paper on arXiv