BLCR: Towards Real-time DNN Execution with Block-based Reweighted Pruning - Citegraph

Paper Info

Title
BLCR: Towards Real-time DNN Execution with Block-based Reweighted Pruning

Abstract
Accelerating DNN execution on resource-limited computing platforms has been a long-standing problem. Prior works utilize ℓ <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</inf> -based group lasso or dynamic regularization such as ADMM to perform structured pruning on DNN models to leverage the parallel computing architectures. However, both of the pruning schemes and pruning methods lack universality, which leads to degraded performance and limited applicability. Considering mobile devices are becoming an important carrier for deep learning tasks, current approaches are not ideal for fully exploiting mobile parallelism while achieving high inference accuracy. To solve the problem, we propose BLCR, a novel block-based pruning framework that comprises a general and flexible structured pruning scheme that enjoys higher flexibility while exploiting full on-device parallelism, as well as a powerful and efficient reweighted regularization method to achieve the proposed sparsity scheme. Our framework is universal, which can be applied to both CNNs and RNNs, implying complete support for the two major kinds of computation-intensive layers (i.e., CONV and FC layers). To complete all aspects of the pruning-for-acceleration task, we also integrate compiler-based code optimization into our framework that can perform DNN inference on mobile devices in real-time. To the best of our knowledge, it is the first time that the weight pruning framework achieves universal coverage for both CNNs and RNNs with real-time mobile acceleration and no accuracy compromise.

Year	DOI	Venue
2022	10.1109/ISQED54688.2022.9806237	2022 23rd International Symposium on Quality Electronic Design (ISQED)
Keywords	DocType	ISSN
parallel computing architectures,mobile devices,deep learning,mobile parallelism,BLCR,on-device parallelism,sparsity scheme,computation-intensive layers,pruning-for-acceleration task,compiler-based code optimization,DNN inference,weight pruning framework,mobile acceleration,DNN execution,block-based pruning framework,reweighted regularization method,RNN,CNN	Conference	1948-3287
ISBN	Citations	PageRank
978-1-6654-9467-0	0	0.34
References	Authors
0	13

Authors (13 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Xiaolong Ma	1	22	5.90
Geng Yuan	2	0	2.37
Zhengang Li	3	0	0.34
Yifan Gong	4	0	0.34
Tianyun Zhang	5	0	0.34
Wei Niu	6	24	11.21
Zheng Zhan	7	0	1.01
Pu Zhao	8	0	0.34
Ning Liu	9	0	0.34
Jian Tang	10	0	0.34
Xue Lin	11	1	2.43
Bin Ren	12	0	0.34
Yanzhi Wang	13	7	1.51

1