Title
ADMM-based Weight Pruning for Real-Time Deep Learning Acceleration on Mobile Devices
Abstract
Deep learning solutions are being increasingly deployed in mobile applications, at least for the inference phase. Due to the large model size and computational requirements, model compression for deep neural networks (DNNs) becomes necessary, especially considering the real-time requirement in embedded systems. In this paper, we extend the prior work on systematic DNN weight pruning using ADMM (Alternating Direction Method of Multipliers). We integrate ADMM regularization with masked mapping/retraining, thereby guaranteeing solution feasibility and providing high solution quality. Besides superior performance on representative DNN benchmarks (e.g., AlexNet, ResNet), we focus on two new applications facial emotion detection and eye tracking, and develop a top-down framework of DNN training, model compression, and acceleration in mobile devices. Experimental results show that with negligible accuracy degradation, the proposed method can achieve significant storage/memory reduction and speedup in mobile devices.
Year
DOI
Venue
2019
10.1145/3299874.3319492
Proceedings of the 2019 on Great Lakes Symposium on VLSI
Keywords
Field
DocType
acceleration, mobile devices, neural networks, real-time
Computer science,Inference,Real-time computing,Mobile device,Regularization (mathematics),Eye tracking,Acceleration,Artificial intelligence,Deep learning,Artificial neural network,Speedup
Conference
ISSN
ISBN
Citations 
1066-1395
978-1-4503-6252-8
2
PageRank 
References 
Authors
0.43
0
9
Name
Order
Citations
PageRank
Hongjia Li175.91
Ning Liu2153.59
Xiaolong Ma3225.90
Sheng Lin413914.39
Shaokai Ye5386.53
Tianyun Zhang6316.42
Xue Lin78614.97
Wenyao Xu861577.06
Yanzhi Wang91082136.11