Trainer: An Energy-Efficient Edge-Device Training Processor Supporting Dynamic Weight Pruning - Citegraph

Paper Info

Title
Trainer: An Energy-Efficient Edge-Device Training Processor Supporting Dynamic Weight Pruning

Abstract
Transfer learning, which transfers knowledge from source datasets to target datasets, is practical for adaptive deep neural network (DNN) applications. When considering user privacy and communication bandwidth issues, edge devices’ training is essential for transfer learning. Nevertheless, training requires repeating feedforward (FF), backpropagation (BP), and weight gradient (WG) millions of times, introducing prohibitive computation for edge devices. A promising method to reduce training computation is sparse DNN training (SDT), which dynamically prunes weights during training iterations and performs FF, BP, and WG only with unpruned weights. However, SDT suffers implicit redundancy and reuse imbalance for convolution layers. Besides, it turns bottlenecks into batch normalization (BN) layers. Therefore, it is challenging to achieve energy-efficient SDT computing. This article proposes a processor, Trainer, solving the above challenges with three features. First, a speculation mechanism removes implicit redundant operations, which have nonzeros’ input, weight, or output, but are ineffective for training. Second, a dynamic sparsity adaptive dataflow tackles the reuse imbalance, improving energy efficiency (EE) for dynamic sparse convolution in SDT. Third, a computational dependence decoupled BN unit eliminates BN’s repeated data access to reduce training energy and time. Trainer is fabricated in 28-nm CMOS technology and occupies 20.96 mm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> of area. It achieves a peak EE of 173.28TFLOPS/W@FP16 (276.55TFLOPS/W@FP8) for a 90% activation sparsity and 90% weight sparsity. The sparsity to EE conversion ratio is 80.9, outperforming the previous work by 1.55 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\times $ </tex-math></inline-formula> . When training a ResNet18 model with SDT, Trainer reduces energy by 2.23 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\times $ </tex-math></inline-formula> and time by 1.76 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\times $ </tex-math></inline-formula> than the state-of-the-art sparse training processor.

Year	DOI	Venue
2022	10.1109/JSSC.2022.3174411	IEEE Journal of Solid-State Circuits
Keywords	DocType	Volume
Batch normalization (BN),deep neural network (DNN),processor,sparse training,sparsity,weight pruning	Journal	57
Issue	ISSN	Citations
10	0018-9200	0
PageRank	References	Authors
0.34	5	9

Authors (9 rows)

Cited by (0 rows)

References (5 rows)

Name	Order	Citations	PageRank
Yang Wang	1	381	51.96
Yubin Qin	2	1	2.04
Dazheng Deng	3	0	1.69
Jingchuan Wei	4	0	0.34
Tianbao Chen	5	5	1.20
Xinhan Lin	6	0	0.34
leibo liu	7	816	116.95
Shaojun Wei	8	555	102.32
shouyi yin	9	579	99.95

1