A 5.99-To-691.1tops/W Tensor-Train In-Memory-Computing Processor Using Bit-Level-Sparsity-Based Optimization And Variable-Precision Quantization - Citegraph

Paper Info

Title
A 5.99-To-691.1tops/W Tensor-Train In-Memory-Computing Processor Using Bit-Level-Sparsity-Based Optimization And Variable-Precision Quantization

Abstract
Computing-in-memory (CIM) improves energy efficiency by enabling parallel multiply-and-accumulate (MAC) operations and reducing memory accesses [1 –4]. However, today’s typical neural networks (NNs) usually exceed on-chip memory capacity. Thus, a CIM-based processor may encounter a memory bottleneck [5]. Tensor-train (TT) is a tensor decomposition method, which decomposes a d-dimensional tensor to d 4D tensor-cores $\\left(\\operatorname{TCs}: G_{k}\\left[r_{k-1}, n_{k}, m_{k}, r_{k}\\right], k=1, \\ldots, d\\right)$ [6]. $G_{k}$ can be viewed as a $2D n_{k} \\times m_{k}$ array, where each element is an $r_{k-1} \\times r_{k}$ matrix. The TCs require $\\Sigma_{k \\in[1, d]} r_{k-1} n_{k} m_{k} r_{k}$ parameters to represent the original tensor, which has $\\Pi_{\\mathrm{k} \\in[1, \\mathrm{d}]} \\mathrm{n}_{\\mathrm{k}} \\mathrm{m}_{\\mathrm{k}}$ parameters. Since r k is typically small, kernels and weight matrices of convolutional, fully-connected and recurrent layers can be compressed significantly by using TT decomposition, thereby enabling storage of an entire NN in a CIM-based processor.

Year	DOI	Venue
2021	10.1109/ISSCC42613.2021.9365989	2021 IEEE INTERNATIONAL SOLID-STATE CIRCUITS CONFERENCE (ISSCC)
DocType	Volume	ISSN
Conference	64	0193-6530
Citations	PageRank	References
1	0.36	0
Authors
12

Authors (12 rows)

Cited by (1 rows)

References (0 rows)

Name	Order	Citations	PageRank
Ruiqi Guo	1	13	3.36
Zhiheng Yue	2	1	1.03
Xin Si	3	49	6.86
te hu	4	2	2.07
Hao Li	5	261	85.92
Limei Tang	6	1	0.70
Yabing Wang	7	1	1.37
leibo liu	8	816	116.95
Meng-Fan Chang	9	459	45.63
Qiang Li	10	81	21.66
Shaojun Wei	11	555	102.32
shouyi yin	12	579	99.95

1