15.4 A 22nm 2mb Reram Compute-In-Memory Macro With 121-28tops/W For Multibit Mac Computing For Tiny Ai Edge Devices

Paper Info

Title
15.4 A 22nm 2mb Reram Compute-In-Memory Macro With 121-28tops/W For Multibit Mac Computing For Tiny Ai Edge Devices

Abstract
Nonvolatile computing-in-memory (nvCIM) can improve the latency (t AC ) and energy-efficiency (EF MAC ) of tiny AI edge devices performing multiply-and-accumulate (MAC) computing after system wake-up. Prior nvCIMs have proven effective for binary input (IN) and weight (W), and 3b output (OUT) [1], 1-8-1b IN-W-OUT [2], and 2-3-4b IN-W-OUT [3] neural networks; however, the higher precision (4-4b IN-W) for MAC operations is needed for multi-bit CNNs to achieved high-inference accuracy [4]. As Fig.15.4.1 shows, improving the precision of nvCIM macros involves various challenges. (1) A large number of activated WLs provides a wide range of BL current (I BL ) resulting in an inaccurate BL-clamping voltage (V BLC ); as well as a large (I BL ) requiring a large array area due to the need for wide metal lines to support high-current density. (2) Previous “WL = input” approaches suffer from: (a) few parallel inputs (IN#) due to (1), and (b) long (t AC ) in multiple cycles of binary WL inputs on 1T1R cells for multibit inputs. (3) Previous positive-negative-split weight-mapping consumes high total (l BL ) and area overhead (needing 2x(m-1) cells for a signed m-bit weight) for cell arrays with high-weight precision. (4) Long (t AC ) and a large number of reference currents (IREF#) for high-precision outputs. To overcome these challenges, this work proposes: (1) a BL-IN-OUT multibit computing (BLIOMC) scheme using a single WL-on and input-aware multibit BL clamping (IA-MBC) to shorten (l BL ) for multibit inputs, increase IN#, and reduce the (l BL ) range/size for accurate (V BLC ) and a compact array area. (2) Scrambled 2u0027s complement (S2C) weight mapping (S2CWM), input-aware source-line (SL) voltage biasing (IA-SLVB), and an S2C value combiner (S2CVC) to reduce area overhead and l BL in the cell array. (3) A dual-bit small-offset current-mode sense amplifier (DbSO-CSA) to reduce IREF# and t AC . A fabricated 22nm 2Mb ReRAM-CIM macro presents the first 4b-input nvCIM macro, featuring a 9.8-18.3ns t AC and an EF MAC of 121.3-28.9TOPS/W from binary to 4bIN-4bW-11bOUT compute precisions.

Year	DOI	Venue
2020	10.1109/ISSCC19947.2020.9063078	2020 IEEE INTERNATIONAL SOLID- STATE CIRCUITS CONFERENCE (ISSCC)
DocType	ISSN	Citations
Conference	0193-6530	0
PageRank	References	Authors
0.34	0	20

Authors (20 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Cheng-Xin Xue	1	22	4.57
Tsung-Yuan Huang	2	14	2.13
Je-Syu Liu	3	6	1.47
Ting-Wei Chang	4	24	5.30
Hui-Yao Kao	5	7	1.81
Jing-Hong Wang	6	31	4.03
Ta-Wei Liu	7	7	2.83
Shih-Ying Wei	8	0	0.68
Sheng-Po Huang	9	1	0.68
Wei-Chen Wei	10	27	3.94
Yi-Ren Chen	11	8	3.98
Tzu-Hsiang Hsu	12	12	4.74
Yen-kai Chen	13	2	1.73
Yun-Chen Lo	14	1	1.70
Tai-Hsing Wen	15	1	1.70
Chung-Chuan Lo	16	15	7.24
Ren-Shuo Liu	17	141	9.86
Chih-Cheng Hsieh	18	218	44.84
Kea-Tiong Tang	19	109	28.91
Meng-Fan Chang	20	459	45.63