Title
Progressive Mimic Learning: A new perspective to train lightweight CNN models
Abstract
Knowledge distillation (KD) builds a lightweight Student Model (SM) and trains it to approximate a large Teacher Model (TM) by exploring knowledge learned by the TM, which shows effectiveness to train lightweight CNN models. However, training a small SM to achieve better performance remains a challenging problem. Recent researches on human learning behaviors show that both the knowledge from teachers and the knowledge learning processes of teachers are significant for students. Inspired by this characteristic, in this paper, we propose a new perspective, called Progressive Mimic Learning (PML), to train lightweight CNN models by mimicking the learning trajectory of the TM. In order to obtain a more powerful SM, the useful hints in the learning process of the TM are explored. To start with, the TM learning process is divided into multiple stages, and the last state of the TM in each stage is recorded as a landmark. The learning trajectory of the TM is composed of these landmarks. Then, a landmark loss is defined to constrain the SM to progressively mimic the learning process of the TM, by employing landmarks in the learning trajectory as a training hint of the SM. Several experiments are conducted on four benchmark data sets, CIFAR-10, CIFAR-100, Fashion-MNIST, and ImageNet-10, to investigate the performance of the PML. The results show that the PML can make SMs generate more accurate predictions than SMs trained by its counterparts.
Year
DOI
Venue
2021
10.1016/j.neucom.2021.04.086
Neurocomputing
Keywords
DocType
Volume
Lightweight CNN models,Progressive Mimic Learning,Knowledge distillation
Journal
456
ISSN
Citations 
PageRank 
0925-2312
1
0.40
References 
Authors
0
5
Name
Order
Citations
PageRank
Hongbin Ma110.40
Shuyuan Yang2537.60
Dongzhu Feng310.40
Licheng Jiao45698475.84
Luping Zhang510.40