Title
Deep Neural Network Compression with Knowledge Distillation Using Cross-Layer Matrix, KL Divergence and Offline Ensemble
Abstract
Knowledge Distillation is one approach in Deep Neural Networks (DNN) to compress huge parameters and high level of computation associated with a teacher model to a smaller student model. Therefore, the smaller model can be deployed in embedded systems. Most of Knowledge Distillations transfer information at the last stage of the DNN model. We propose an efficient compression method that can be split into three parts. First, we propose a cross-layer Gramian matrix to extract more features from the teacher's model. Second, we adopt Kullback Leibler (KL) Divergence in an offline deep mutual learning (DML) environment to make the student model find a wider robust minimum. Finally, we propose the use of offline ensemble pre-trained teachers to teach a student model. With ResNet-32 as the teacher's model and ResNet-8 as the student's model, experimental results showed that Top-l accuracy increased by 4.38% with a 6. 11x compression rate and 5. 27x computation rate.
Year
Venue
Keywords
2020
2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
Deep Convolutional Model Compression,Knowledge Distillation,Transfer Learning
DocType
ISSN
ISBN
Conference
2640-009X
978-1-7281-8130-1
Citations 
PageRank 
References 
0
0.34
1
Authors
3
Name
Order
Citations
PageRank
Hsing-Hung Chou100.34
Ching-Te Chiu230438.60
Yi-Ping Liao300.34