Abstract | ||
---|---|---|
On-chip training of large-scale deep neural networks (DNNs) is challenging due to computational complexity and resource limitation. Compute-in-memory (CIM) architecture exploits the analog computation inside the memory array to speed up the vector-matrix multiplication (VMM) and alleviate the memory bottleneck. However, existing CIM prototype chips, in particular, SRAM-based accelerators target at implementing low-precision inference engine only. In this work, we propose a two-way SRAM array design that could perform bi-directional in-memory VMM with minimum hardware overhead. A novel solution of signed number multiplication is also proposed to handle the negative input in backpropagation. We taped-out and validated proposed two-way SRAM array design in TSMC 28nm process. Based on the silicon measurement data on CIM macro, we explore the hardware performance for the entire architecture for DNN on-chip training. The experimental data shows that proposed accelerator can achieve energy efficiency of similar to 3.2 TOPS/W, >1000 FPS and >300 FPS for ResNet and DenseNet training on ImageNet, respectively. |
Year | DOI | Venue |
---|---|---|
2020 | 10.1109/DAC18072.2020.9218524 | PROCEEDINGS OF THE 2020 57TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC) |
DocType | ISSN | Citations |
Conference | 0738-100X | 0 |
PageRank | References | Authors |
0.34 | 0 | 10 |
Name | Order | Citations | PageRank |
---|---|---|---|
Hongwu Jiang | 1 | 16 | 6.77 |
Shanshi Huang | 2 | 15 | 6.75 |
Xiaochen Peng | 3 | 61 | 12.17 |
Jian-Wei Su | 4 | 13 | 3.61 |
Yen-Chi Chou | 5 | 8 | 2.20 |
Wei-Hsing Huang | 6 | 25 | 2.56 |
Ta-Wei Liu | 7 | 7 | 2.83 |
Ruhui Liu | 8 | 10 | 1.93 |
Meng-Fan Chang | 9 | 459 | 45.63 |
Shimeng Yu | 10 | 490 | 56.22 |