Title | ||
---|---|---|
15.4 A 22nm 2mb Reram Compute-In-Memory Macro With 121-28tops/W For Multibit Mac Computing For Tiny Ai Edge Devices |
Abstract | ||
---|---|---|
Nonvolatile computing-in-memory (nvCIM) can improve the latency (t AC ) and energy-efficiency (EF MAC ) of tiny AI edge devices performing multiply-and-accumulate (MAC) computing after system wake-up. Prior nvCIMs have proven effective for binary input (IN) and weight (W), and 3b output (OUT) [1], 1-8-1b IN-W-OUT [2], and 2-3-4b IN-W-OUT [3] neural networks; however, the higher precision (4-4b IN-W) for MAC operations is needed for multi-bit CNNs to achieved high-inference accuracy [4]. As Fig.15.4.1 shows, improving the precision of nvCIM macros involves various challenges. (1) A large number of activated WLs provides a wide range of BL current (I BL ) resulting in an inaccurate BL-clamping voltage (V BLC ); as well as a large (I BL ) requiring a large array area due to the need for wide metal lines to support high-current density. (2) Previous “WL = input” approaches suffer from: (a) few parallel inputs (IN#) due to (1), and (b) long (t AC ) in multiple cycles of binary WL inputs on 1T1R cells for multibit inputs. (3) Previous positive-negative-split weight-mapping consumes high total (l BL ) and area overhead (needing 2x(m-1) cells for a signed m-bit weight) for cell arrays with high-weight precision. (4) Long (t AC ) and a large number of reference currents (IREF#) for high-precision outputs. To overcome these challenges, this work proposes: (1) a BL-IN-OUT multibit computing (BLIOMC) scheme using a single WL-on and input-aware multibit BL clamping (IA-MBC) to shorten (l BL ) for multibit inputs, increase IN#, and reduce the (l BL ) range/size for accurate (V BLC ) and a compact array area. (2) Scrambled 2u0027s complement (S2C) weight mapping (S2CWM), input-aware source-line (SL) voltage biasing (IA-SLVB), and an S2C value combiner (S2CVC) to reduce area overhead and l BL in the cell array. (3) A dual-bit small-offset current-mode sense amplifier (DbSO-CSA) to reduce IREF# and t AC . A fabricated 22nm 2Mb ReRAM-CIM macro presents the first 4b-input nvCIM macro, featuring a 9.8-18.3ns t AC and an EF MAC of 121.3-28.9TOPS/W from binary to 4bIN-4bW-11bOUT compute precisions. |
Year | DOI | Venue |
---|---|---|
2020 | 10.1109/ISSCC19947.2020.9063078 | 2020 IEEE INTERNATIONAL SOLID- STATE CIRCUITS CONFERENCE (ISSCC) |
DocType | ISSN | Citations |
Conference | 0193-6530 | 0 |
PageRank | References | Authors |
0.34 | 0 | 20 |
Name | Order | Citations | PageRank |
---|---|---|---|
Cheng-Xin Xue | 1 | 22 | 4.57 |
Tsung-Yuan Huang | 2 | 14 | 2.13 |
Je-Syu Liu | 3 | 6 | 1.47 |
Ting-Wei Chang | 4 | 24 | 5.30 |
Hui-Yao Kao | 5 | 7 | 1.81 |
Jing-Hong Wang | 6 | 31 | 4.03 |
Ta-Wei Liu | 7 | 7 | 2.83 |
Shih-Ying Wei | 8 | 0 | 0.68 |
Sheng-Po Huang | 9 | 1 | 0.68 |
Wei-Chen Wei | 10 | 27 | 3.94 |
Yi-Ren Chen | 11 | 8 | 3.98 |
Tzu-Hsiang Hsu | 12 | 12 | 4.74 |
Yen-kai Chen | 13 | 2 | 1.73 |
Yun-Chen Lo | 14 | 1 | 1.70 |
Tai-Hsing Wen | 15 | 1 | 1.70 |
Chung-Chuan Lo | 16 | 15 | 7.24 |
Ren-Shuo Liu | 17 | 141 | 9.86 |
Chih-Cheng Hsieh | 18 | 218 | 44.84 |
Kea-Tiong Tang | 19 | 109 | 28.91 |
Meng-Fan Chang | 20 | 459 | 45.63 |