Title
15.4 A 22nm 2mb Reram Compute-In-Memory Macro With 121-28tops/W For Multibit Mac Computing For Tiny Ai Edge Devices
Abstract
Nonvolatile computing-in-memory (nvCIM) can improve the latency (t AC ) and energy-efficiency (EF MAC ) of tiny AI edge devices performing multiply-and-accumulate (MAC) computing after system wake-up. Prior nvCIMs have proven effective for binary input (IN) and weight (W), and 3b output (OUT) [1], 1-8-1b IN-W-OUT [2], and 2-3-4b IN-W-OUT [3] neural networks; however, the higher precision (4-4b IN-W) for MAC operations is needed for multi-bit CNNs to achieved high-inference accuracy [4]. As Fig.15.4.1 shows, improving the precision of nvCIM macros involves various challenges. (1) A large number of activated WLs provides a wide range of BL current (I BL ) resulting in an inaccurate BL-clamping voltage (V BLC ); as well as a large (I BL ) requiring a large array area due to the need for wide metal lines to support high-current density. (2) Previous “WL = input” approaches suffer from: (a) few parallel inputs (IN#) due to (1), and (b) long (t AC ) in multiple cycles of binary WL inputs on 1T1R cells for multibit inputs. (3) Previous positive-negative-split weight-mapping consumes high total (l BL ) and area overhead (needing 2x(m-1) cells for a signed m-bit weight) for cell arrays with high-weight precision. (4) Long (t AC ) and a large number of reference currents (IREF#) for high-precision outputs. To overcome these challenges, this work proposes: (1) a BL-IN-OUT multibit computing (BLIOMC) scheme using a single WL-on and input-aware multibit BL clamping (IA-MBC) to shorten (l BL ) for multibit inputs, increase IN#, and reduce the (l BL ) range/size for accurate (V BLC ) and a compact array area. (2) Scrambled 2u0027s complement (S2C) weight mapping (S2CWM), input-aware source-line (SL) voltage biasing (IA-SLVB), and an S2C value combiner (S2CVC) to reduce area overhead and l BL in the cell array. (3) A dual-bit small-offset current-mode sense amplifier (DbSO-CSA) to reduce IREF# and t AC . A fabricated 22nm 2Mb ReRAM-CIM macro presents the first 4b-input nvCIM macro, featuring a 9.8-18.3ns t AC and an EF MAC of 121.3-28.9TOPS/W from binary to 4bIN-4bW-11bOUT compute precisions.
Year
DOI
Venue
2020
10.1109/ISSCC19947.2020.9063078
2020 IEEE INTERNATIONAL SOLID- STATE CIRCUITS CONFERENCE (ISSCC)
DocType
ISSN
Citations 
Conference
0193-6530
0
PageRank 
References 
Authors
0.34
0
20
Name
Order
Citations
PageRank
Cheng-Xin Xue1224.57
Tsung-Yuan Huang2142.13
Je-Syu Liu361.47
Ting-Wei Chang4245.30
Hui-Yao Kao571.81
Jing-Hong Wang6314.03
Ta-Wei Liu772.83
Shih-Ying Wei800.68
Sheng-Po Huang910.68
Wei-Chen Wei10273.94
Yi-Ren Chen1183.98
Tzu-Hsiang Hsu12124.74
Yen-kai Chen1321.73
Yun-Chen Lo1411.70
Tai-Hsing Wen1511.70
Chung-Chuan Lo16157.24
Ren-Shuo Liu171419.86
Chih-Cheng Hsieh1821844.84
Kea-Tiong Tang1910928.91
Meng-Fan Chang2045945.63