TFE: Energy-efficient Transferred Filter-based Engine to Compress and Accelerate Convolutional Neural Networks - Citegraph

Paper Info

Title
TFE: Energy-efficient Transferred Filter-based Engine to Compress and Accelerate Convolutional Neural Networks

Abstract
Although convolutional neural network (CNN) models have greatly enhanced the development of many fields, the untenable number of parameters and computations in these models yield significant performance and energy challenges in hardware implementations. Transferred filter-based methods, as very promising techniques that have not yet been explored in the architecture domain, can substantially compress CNN models. However, their straightforward hardware implementation inherently incurs massive redundant computations, causing significant energy and time consumption. In this work, a highly efficient transferred filter-based engine (TFE) is developed to alleviate this deficiency, with CNN models compressed and accelerated. First, the filters of CNN models are flexibly transferred according to specific tasks to reduce the model size. Then, two hardware-friendly mechanisms are proposed in the TFE to remove duplicate computations caused by transferred filters, which can further accelerate transferred CNN models. The first mechanism exploits the shared weights hidden in each row of transferred filters and reuses the corresponding same partial sums, reducing at least 25% of repetitive computations in each row. The second mechanism can intelligently schedule and access the memory system to reuse the repetitive partial sums among different rows of the transferred filters with at least 25% of computations eliminated. Furthermore, an efficient hardware architecture is proposed in the TFE to fully reap the benefits of the two proposed mechanisms such that different types of networks are flexibly supported. To achieve high energy efficiency, the sub-array-based filter mapping method (SAFM) is proposed, where the process element (PE) subarray is used as the elementary computational unit to support various filters. Therein, input data can be efficiently broadcast in each PE sub-array and the load can be stripped from each PE and intensively alleviated, which can dramatically reduce the area and power consumption. Excluding MobileNet-like networks that adopt depth-wise convolution, most mainstream networks can be compressed and accelerated by the proposed TFE. Two state-of-the-art transferred filter-based methods, i.e., doubly CNN and symmetry CNN are implemented by exploiting the TFE. Compared with Eyeriss, average speedup improvements of 2.93× and 3.17× are achieved in the convolutional layers of various modern CNNs. The overall energy efficiency can be improved by 12.66× and 13.31× on average. Compared with other state-of-the-art related works, the TFE can maximally achieve a parameter reduction of 4.0×, a speedup of 2.72× and an energy efficiency improvement of 10.74× on VGGNet.

Year	DOI	Venue
2020	10.1109/MICRO50266.2020.00067	2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)
Keywords	DocType	ISBN
TFE,energy-efficient transferred filter-based engine,convolutional neural network models,energy challenges,CNN model,hardware-friendly mechanisms,repetitive computations,hardware architecture,high energy efficiency,sub-array-based filter mapping method,elementary computational unit,energy efficiency improvement,transferred filter-based methods,transferred filter-based engine	Conference	978-1-7281-7384-9
Citations	PageRank	References
1	0.35	12
Authors
10

Authors (10 rows)

Cited by (1 rows)

References (12 rows)

Name	Order	Citations	PageRank
Huiyu Mo	1	8	3.59
leibo liu	2	816	116.95
Wenjing Hu	3	11	6.39
Wenping Zhu	4	22	6.59
Qiang Li	5	599	54.40
Ang Li	6	18	2.46
shouyi yin	7	579	99.95
J. L. Chen	8	35	7.77
Xiaowei Jiang	9	6	1.86
Shaojun Wei	10	555	102.32

1