Abstract | ||
---|---|---|
Compressing neural networks by pruning weights with small magnitudes can significantly reduce the computation and storage cost. Although pruning makes the model smaller, it is difficult to get a practical speedup in modern computing platforms such as CPU and GPU due to the irregularity. Structural pruning has attracted a lot of research interest to make sparsity hardware-friendly. Increasing the sparsity granularity can lead to better hardware utilization, but it will compromise the sparsity for maintaining accuracy. In this work, we propose a novel method, TETRIS, to achieve both better hardware utilization and higher sparsity. Just like a tile-matching game(2), we cluster the irregularly distributed weights with small value into structured groups by reordering the input/output dimension and structurally prune them. Results show that it can achieve comparable sparsity with the irregular element-wise pruning and demonstrate negligible accuracy loss. The experiments also show ideal speedup, which is proportional to the sparsity, on GPU platforms. Our proposed method provides a new solution toward algorithm and architecture co-optimization for accuracy-efficiency trade-off. |
Year | Venue | Keywords |
---|---|---|
2018 | ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018) | neural networks,proposed method,tile-matching game,ideal speedup |
Field | DocType | Volume |
Computer science,Parallel computing,Curse of dimensionality,Artificial intelligence,Granularity,Artificial neural network,Tile,Machine learning,Speedup,Pruning,Computation | Conference | 31 |
ISSN | Citations | PageRank |
1049-5258 | 0 | 0.34 |
References | Authors | |
0 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ji, Yu | 1 | 22 | 2.66 |
Ling Liang | 2 | 12 | 3.07 |
Lei Deng | 3 | 177 | 30.01 |
Zhang, Youyang | 4 | 7 | 0.77 |
Youhui Zhang | 5 | 202 | 28.36 |
Yuan Xie | 6 | 6430 | 407.00 |