Deep Neural Network Compression with Knowledge Distillation Using Cross-Layer Matrix, KL Divergence and Offline Ensemble - Citegraph

Paper Info

Title
Deep Neural Network Compression with Knowledge Distillation Using Cross-Layer Matrix, KL Divergence and Offline Ensemble

Abstract
Knowledge Distillation is one approach in Deep Neural Networks (DNN) to compress huge parameters and high level of computation associated with a teacher model to a smaller student model. Therefore, the smaller model can be deployed in embedded systems. Most of Knowledge Distillations transfer information at the last stage of the DNN model. We propose an efficient compression method that can be split into three parts. First, we propose a cross-layer Gramian matrix to extract more features from the teacher's model. Second, we adopt Kullback Leibler (KL) Divergence in an offline deep mutual learning (DML) environment to make the student model find a wider robust minimum. Finally, we propose the use of offline ensemble pre-trained teachers to teach a student model. With ResNet-32 as the teacher's model and ResNet-8 as the student's model, experimental results showed that Top-l accuracy increased by 4.38% with a 6. 11x compression rate and 5. 27x computation rate.

Year	Venue	Keywords
2020	2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)	Deep Convolutional Model Compression,Knowledge Distillation,Transfer Learning
DocType	ISSN	ISBN
Conference	2640-009X	978-1-7281-8130-1
Citations	PageRank	References
0	0.34	1
Authors
3

Authors (3 rows)

Cited by (0 rows)

References (1 rows)

Name	Order	Citations	PageRank
Hsing-Hung Chou	1	0	0.34
Ching-Te Chiu	2	304	38.60
Yi-Ping Liao	3	0	0.34

1