Title
A Novel Dsp Architecture For Scientific Computing And Deep Learning
Abstract
Exascale computing requires accelerators with ultrahigh power efficiency. Digital signal processors (DSPs), the most important embedded processors widely known for high power efficiency, are rarely explored in the HPC community. We propose a 64-bit general purpose DSP architecture, FT-Matrix2000, which not only integrates the main features of DSPs but also presents several novel enhancements for scientific computing. The FT-Matrix2000 architecture comprises multiple FT-Matrix2 cores and optional RISC CPU cores. The FT-Matrix2 core utilizes a VLIW+SIMD architecture, provides support for double precision operations, and optimizes both the data and control path for scientific computing. Our evaluations show that the performance and efficiency of FT-Matrix2000 are 1107GFLOPS and 92.25%. Compared with the MIC and a 40nm process GPU, FT-Matrix2000 improves the GEMM power efficiency with a factor of 1.49 and 2.68, respectively. We build up a prototype supercomputer with FT-Matrix2000/12. Its HPL efficiency achieves 62.2%, and the performance power ratio is 5.33 GFLOPS/W, which can rank the fourth in the latest Green500 list. These results validate that the FT-Matrix2000 architecture is suitable for scientific computing while maintaining the efficiency of signal processing well. Moreover, the enhancement of FT-Matrix2000 in vector and matrix related computations also enable it to efficiently support deep learning related applications. We have implemented some typical DCNN models on FT-Matrx2000, NVIDIA GPUs, and Vision P6 DSP. The experiments demonstrate that the average computation efficiency of the proposed architecture based on Matrix2000 is about 20 similar to 35% and 8% higher respectively than GPUs and Cadence Vision P6 DSP.
Year
DOI
Venue
2019
10.1109/ACCESS.2019.2905302
IEEE ACCESS
Keywords
Field
DocType
DSP, architecture, scientific computing, HPC
Electrical efficiency,Exascale computing,Signal processing,Digital signal processing,Supercomputer,Digital signal processor,Very long instruction word,FLOPS,Computer science,Computational science
Journal
Volume
ISSN
Citations 
7
2169-3536
1
PageRank 
References 
Authors
0.41
0
5
Name
Order
Citations
PageRank
Chao Yang139939.13
Shuming Chen213838.21
Jian Zhang322.47
Zhao Lv4276.46
Zhi Wang57614.27