Optimization of General Matrix Multiply Library for Ternary Weight for Fast DNN Inference - Citegraph

Paper Info

Title
Optimization of General Matrix Multiply Library for Ternary Weight for Fast DNN Inference

Abstract
Efficient implementation of deep neural networks (DNNs) on CPU-based systems is critical owing to the proliferation of applications in embedded and Internet of Things systems. Nowdays, most CPUs are equipped with single instruction multiple data (SIMD) instructions, which are used to implement an efficient general matrix multiply (GEMM) library for accelerating DNN inference. Quantized neural networks are actively investigated to simplify DNN computation and memory requirements; however, the current CPU libraries do not efficiently support arithmetic operations below eight bits. Hence, we developed TernGEMM, a GEMM library composed of SIMD instructions for DNNs with ternary weights and sub-8-bit activations. TernGEMM is implemented using simple logical operations that replace the long-latency multiply–add operation. Instead of fixing the accumulation bit precision as 32-bit, TernGEMM accumulates the partial sums in a bit-incremental manner to exploit parallelism in 8-bit and 16-bit SIMD instructions. Furthermore, we propose different tile sizes for TernGEMM to better support the diverse dimensions of DNNs. Compared with a state-of–the-art reduced precision DNN GEMM library, i.e., GEMMLowp, TernGEMM achieve $$\times$$ 1.785 to $$\times$$ 4.147 speedup for ResNet50, MobileNet-V2, and EfficientNet-B0, as evaluated on both Intel and ARM CPUs.

Year	DOI	Venue
2022	10.1007/s11265-022-01782-3	Journal of Signal Processing Systems
Keywords	DocType	Volume
Matrix multiplication, Implementation, Deep neural networks, Inference	Journal	94
Issue	ISSN	Citations
10	1939-8018	0
PageRank	References	Authors
0.34	1	5

Authors (5 rows)

Cited by (0 rows)

References (1 rows)

Name	Order	Citations	PageRank
Choi Seokhyeon	1	0	0.34
Shim Kyuhong	2	0	0.34
Choi Jungwook	3	0	0.34
Wonyong Sung	4	1445	166.19
Shim Byonghyo	5	0	0.34

1