Title
Batch Normalization: Is Learning An Adaptive Gain And Bias Necessary?
Abstract
The state-of-the-art training of deep neural networks requires to normalize the activities of the neurons for accelerating the training process. A standard approach is to employ batch normalization (BN), in which the activations are normalized by the mean and standard deviation of the training mini-batch. To be invertible, BN also introduces an adaptive gain and bias which are applied after the normalization but often before the non-linearity. In this paper, we investigate the effects of learnable parameters, gain and bias, on the training of various typical deep neural nets, including ALL-CNNs, Network In Network (NIN), ResNets. Through extensive experiments, we show that there is no big difference in both training convergence and final test accuracy if we remove the BN layer following the final convolutional layer from a convolutional neural network (CNN) for standard classification tasks. We also observed that without adaptively updating learnable parameters for BN layers, it often requires less time for training of very deep neural nets such as ResNet-101.
Year
DOI
Venue
2018
10.1145/3195106.3195150
PROCEEDINGS OF 2018 10TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING (ICMLC 2018)
Keywords
Field
DocType
batch normalization, CNN, gain and bias
Convergence (routing),Normalization (statistics),Pattern recognition,Computer science,Convolutional neural network,Artificial intelligence,Invertible matrix,Artificial neural network,Standard deviation,Deep neural networks
Conference
Citations 
PageRank 
References 
0
0.34
8
Authors
6
Name
Order
Citations
PageRank
Yan Wang143.80
Xiaofu Wu242.78
Yuanyuan Chang300.34
Suofei Zhang4347.26
Quan Zhou5103.58
Jun Yan622.06