Abstract | ||
---|---|---|
In-memory computing reduces latency and energy consumption of Deep Neural Networks (DNNs) by reducing the number of off-chip memory accesses. However, crossbar-based in-memory computing may significantly increase the volume of on-chip communication since the weights and activations are on-chip. State-of-the-art interconnect methodologies for in-memory computing deploy a bus-based network or mesh-based Network-on-Chip (NoC). Our experiments show that up to 90% of the total inference latency of a DNN hardware is spent on on-chip communication when the bus-based network is used. To reduce the communication latency, we propose a methodology to generate an NoC architecture along with a scheduling technique customized for different DNNs. We prove mathematically that the generated NoC architecture and corresponding schedules achieve the minimum possible communication latency for a given DNN. Furthermore, we generalize the proposed solution for edge computing and cloud computing. Experimental evaluations on a wide range of DNNs show that the proposed NoC architecture enables 20%–80% reduction in communication latency with respect to state-of-the-art interconnect solutions. |
Year | DOI | Venue |
---|---|---|
2020 | 10.1109/JETCAS.2020.3015509 | IEEE Journal on Emerging and Selected Topics in Circuits and Systems |
Keywords | DocType | Volume |
Integrated circuit interconnections,System-on-chip,Hardware,Memory management,Acceleration,Cloud computing | Journal | 10 |
Issue | ISSN | Citations |
3 | 2156-3357 | 3 |
PageRank | References | Authors |
0.40 | 0 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Sumit K. Mandal | 1 | 12 | 1.92 |
Gokul Krishnan | 2 | 24 | 7.77 |
Chaitali Chakrabarti | 3 | 1978 | 184.17 |
Jae-sun Seo | 4 | 536 | 56.32 |
Yu Cao | 5 | 2765 | 245.91 |
Umit Y. Ogras | 6 | 3 | 0.40 |