Architecting Effectual Computation for Machine Learning Accelerators - Citegraph

Paper Info

Title
Architecting Effectual Computation for Machine Learning Accelerators

Abstract
Inference efficiency is the predominant design consideration for modern machine learning accelerators. The ability of executing multiply-and-accumulate (MAC) significantly impacts the throughput and energy consumption during inference. However, MAC operation suffers from significant ineffectual computations that severely undermines the inference efficiency and must be appropriately handled by the accelerator. The ineffectual computations are manifested in two ways: first, zero values as the input operands of the multiplier, waste time and energy but contribute nothing to the model inference; second, zero bits in nonzero values occupy a large portion of multiplication time but are useless to the final result. In this article, we propose an ineffectual-free yet cost-effective computing architecture, called split-and-accumulate (SAC) with two essential bit detection mechanisms to address these intractable problems in tandem. It replaces the conventional MAC operation in the accelerator by only manipulating the essential bits in the parameters (weights) to accomplish the partial sum computation. Besides, it also eliminates multiplications without any accuracy loss, and supports a wide range of precision configurations. Based on SAC, we propose an accelerator family called Tetris and demonstrate its application in accelerating state-of-the-art deep learning models. Tetris includes two implementations designed for either high performance (i.e., cloud applications) or low power consumption (i.e., edge devices), respectively, contingent to its built-in essential bit detection mechanism. We evaluate our design with Vivado HLS platform and achieve up to <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$6.96\times $ </tex-math></inline-formula> performance enhancement, and up to <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$55.1\times $ </tex-math></inline-formula> energy efficiency improvement over conventional accelerator designs.

Year	DOI	Venue
2020	10.1109/TCAD.2019.2946810	IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Keywords	DocType	Volume
Computational modeling,Throughput,Adders,Machine learning,Acceleration,Kernel,Computational efficiency	Journal	39
Issue	ISSN	Citations
10	0278-0070	0
PageRank	References	Authors
0.34	0	6

Authors (6 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Mingzhe Zhang	1	11	4.23
Mingzhe Zhang	2	43	8.31
Yinhe Han	3	666	67.18
Qi Wang	4	29	3.40
Huawei Li	5	202	35.48
Xinrong Li	6	1266	157.76

1