Title
TFE: Energy-efficient Transferred Filter-based Engine to Compress and Accelerate Convolutional Neural Networks
Abstract
Although convolutional neural network (CNN) models have greatly enhanced the development of many fields, the untenable number of parameters and computations in these models yield significant performance and energy challenges in hardware implementations. Transferred filter-based methods, as very promising techniques that have not yet been explored in the architecture domain, can substantially compress CNN models. However, their straightforward hardware implementation inherently incurs massive redundant computations, causing significant energy and time consumption. In this work, a highly efficient transferred filter-based engine (TFE) is developed to alleviate this deficiency, with CNN models compressed and accelerated. First, the filters of CNN models are flexibly transferred according to specific tasks to reduce the model size. Then, two hardware-friendly mechanisms are proposed in the TFE to remove duplicate computations caused by transferred filters, which can further accelerate transferred CNN models. The first mechanism exploits the shared weights hidden in each row of transferred filters and reuses the corresponding same partial sums, reducing at least 25% of repetitive computations in each row. The second mechanism can intelligently schedule and access the memory system to reuse the repetitive partial sums among different rows of the transferred filters with at least 25% of computations eliminated. Furthermore, an efficient hardware architecture is proposed in the TFE to fully reap the benefits of the two proposed mechanisms such that different types of networks are flexibly supported. To achieve high energy efficiency, the sub-array-based filter mapping method (SAFM) is proposed, where the process element (PE) subarray is used as the elementary computational unit to support various filters. Therein, input data can be efficiently broadcast in each PE sub-array and the load can be stripped from each PE and intensively alleviated, which can dramatically reduce the area and power consumption. Excluding MobileNet-like networks that adopt depth-wise convolution, most mainstream networks can be compressed and accelerated by the proposed TFE. Two state-of-the-art transferred filter-based methods, i.e., doubly CNN and symmetry CNN are implemented by exploiting the TFE. Compared with Eyeriss, average speedup improvements of 2.93× and 3.17× are achieved in the convolutional layers of various modern CNNs. The overall energy efficiency can be improved by 12.66× and 13.31× on average. Compared with other state-of-the-art related works, the TFE can maximally achieve a parameter reduction of 4.0×, a speedup of 2.72× and an energy efficiency improvement of 10.74× on VGGNet.
Year
DOI
Venue
2020
10.1109/MICRO50266.2020.00067
2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)
Keywords
DocType
ISBN
TFE,energy-efficient transferred filter-based engine,convolutional neural network models,energy challenges,CNN model,hardware-friendly mechanisms,repetitive computations,hardware architecture,high energy efficiency,sub-array-based filter mapping method,elementary computational unit,energy efficiency improvement,transferred filter-based methods,transferred filter-based engine
Conference
978-1-7281-7384-9
Citations 
PageRank 
References 
1
0.35
12
Authors
10
Name
Order
Citations
PageRank
Huiyu Mo183.59
leibo liu2816116.95
Wenjing Hu3116.39
Wenping Zhu4226.59
Qiang Li559954.40
Ang Li6182.46
shouyi yin757999.95
J. L. Chen8357.77
Xiaowei Jiang961.86
Shaojun Wei10555102.32