Abstract | ||
---|---|---|
A recent breakthrough in deep learning theory shows that the training of over-parameterized deep neural networks can be characterized by a kernel function called \textit{neural tangent kernel} (NTK). However, it is known that this type of results does not perfectly match the practice, as NTK-based analysis requires the network weights to stay very close to their initialization throughout training, and cannot handle regularizers or gradient noises. In this paper, we provide a generalized neural tangent kernel analysis and show that noisy gradient descent with weight decay can still exhibit a ``kernel-like'' behavior. This implies that the training loss converges linearly up to a certain accuracy. We also establish a novel generalization error bound for two-layer neural networks trained by noisy gradient descent with weight decay. |
Year | Venue | DocType |
---|---|---|
2020 | NIPS 2020 | Conference |
Volume | Citations | PageRank |
33 | 1 | 0.36 |
References | Authors | |
0 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Zixiang Chen | 1 | 1 | 0.36 |
Yuan Cao | 2 | 20 | 6.45 |
Quanquan Gu | 3 | 1116 | 78.25 |
Zhang, Tong | 4 | 7126 | 611.43 |