Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism. - Citegraph

Paper Info

Title
Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism.

Abstract
As the size of Deep Neural Networks (DNNs) continues to grow to increase accuracy and solve more complex problems, their energy footprint also scales. Weight pruning reduces DNN model size and the computation by removing redundant weights. However, we implemented weight pruning for several popular networks on a variety of hardware platforms and observed surprising results. For many networks, the network sparsity caused by weight pruning will actually hurt the overall performance despite large reductions in the model size and required multiply-accumulate operations. Also, encoding the sparse format of pruned networks incurs additional storage space overhead. To overcome these challenges, we propose Scalpel that customizes DNN pruning to the underlying hardware by matching the pruned network structure to the data-parallel hardware organization. Scalpel consists of two techniques: SIMD-aware weight pruning and node pruning. For low-parallelism hardware (e.g., microcontroller), SIMD-aware weight pruning maintains weights in aligned fixed-size groups to fully utilize the SIMD units. For high-parallelism hardware (e.g., GPU), node pruning removes redundant nodes, not redundant weights, thereby reducing computation without sacrificing the dense matrix format. For hardware with moderate parallelism (e.g., desktop CPU), SIMD-aware weight pruning and node pruning are synergistically applied together. Across the microcontroller, CPU and GPU, Scalpel achieves mean speedups of 3.54x, 2.61x, and 1.25x while reducing the model sizes by 88%, 82%, and 53%. In comparison, traditional weight pruning achieves mean speedups of 1.90x, 1.06x, 0.41x across the three platforms.

Year	DOI	Venue
2017	10.1145/3079856.3080215	ISCA
Keywords	Field	DocType
neural network pruning, hardware parallelism, single instruction, multiple data	Computer science,Parallel computing,SIMD,Real-time computing,Microcontroller,Pruning (decision trees),Footprint,Computer hardware,Sparse matrix,Computation,Encoding (memory),Pruning	Conference
ISBN	Citations	PageRank
978-1-5090-5901-0	30	0.81
References	Authors
27	6

Authors (6 rows)

Cited by (30 rows)

References (27 rows)

Name	Order	Citations	PageRank
Jiecao Yu	1	40	1.96
Andrew Lukefahr	2	153	7.08
David J. Palframan	3	68	3.90
Ganesh S. Dasika	4	387	24.30
Reetuparna Das	5	1117	47.07
Scott Mahlke	6	4811	312.08

1