Abstract | ||
---|---|---|
The state-of-the-art training of deep neural networks requires to normalize the activities of the neurons for accelerating the training process. A standard approach is to employ batch normalization (BN), in which the activations are normalized by the mean and standard deviation of the training mini-batch. To be invertible, BN also introduces an adaptive gain and bias which are applied after the normalization but often before the non-linearity. In this paper, we investigate the effects of learnable parameters, gain and bias, on the training of various typical deep neural nets, including ALL-CNNs, Network In Network (NIN), ResNets. Through extensive experiments, we show that there is no big difference in both training convergence and final test accuracy if we remove the BN layer following the final convolutional layer from a convolutional neural network (CNN) for standard classification tasks. We also observed that without adaptively updating learnable parameters for BN layers, it often requires less time for training of very deep neural nets such as ResNet-101. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1145/3195106.3195150 | PROCEEDINGS OF 2018 10TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING (ICMLC 2018) |
Keywords | Field | DocType |
batch normalization, CNN, gain and bias | Convergence (routing),Normalization (statistics),Pattern recognition,Computer science,Convolutional neural network,Artificial intelligence,Invertible matrix,Artificial neural network,Standard deviation,Deep neural networks | Conference |
Citations | PageRank | References |
0 | 0.34 | 8 |
Authors | ||
6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Yan Wang | 1 | 4 | 3.80 |
Xiaofu Wu | 2 | 4 | 2.78 |
Yuanyuan Chang | 3 | 0 | 0.34 |
Suofei Zhang | 4 | 34 | 7.26 |
Quan Zhou | 5 | 10 | 3.58 |
Jun Yan | 6 | 2 | 2.06 |