Abstract | ||
---|---|---|
This work presents a DNN accelerator architecture specifically designed for performing efficient inference on compressed and sparse DNN models. Leveraging the data sparsity, a runtime processing scheme is proposed to deal with the encoded weights and activations directly in the compressed domain without decompressing. Furthermore, a new data flow is proposed to facilitate the reusage of input activations across the fully-connected (FC) layers. The proposed design is implemented and verified using the Xilinx Virtex-7 FPGA. Experimental results show it achieves 1.99x, 1.95x faster and 20.38x, 3.04x more energy efficient than CPU and mGPU platforms, respectively, running AlexNet. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1587/transinf.2020EDL8153 | IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS |
Keywords | DocType | Volume |
deep neural networks, filed programmable gate array, run-length compression, sparse data | Journal | E104D |
Issue | ISSN | Citations |
5 | 1745-1361 | 0 |
PageRank | References | Authors |
0.34 | 0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Hao Xiao | 1 | 14 | 2.43 |
Kaikai Zhao | 2 | 0 | 0.34 |
Guangzhu Liu | 3 | 0 | 0.34 |