Abstract | ||
---|---|---|
The development of flexible Convolutional Neural Network (CNN) accelerators is critical for large-scale inference and training. Accelerators based on the General Matrix Multiplication (GEMM) kernel have gained popularity due to their ability to accelerate the most prevalent convolutional and fully connected layers in CNNs. However, the convolution inputs must be reshaped and packed into redundant matrices, which is performed by the im2col (image to column) algorithm. As the performance of the GEMM kernel improves, it increases latency and gradually becomes a bottleneck. To address this issue, we propose Smart Data Stream Transformation (SDST), a technique that eliminates explicit data transformation through data stream manipulation. SDST divides the input data into conflict-free streams based on the locality of data redundancy. Additionally, we design the continuity-friendly data layout to unify the transformations across data streams. Our design is evaluated by running the YoloV3-tiny model on an FPGA-based prototype system. Experimental results show that SDST improves the performance of convolutional acceleration by a factor of 1.12 to 5.69 compared to explicit im2col performed on the CPU. |
Year | DOI | Venue |
---|---|---|
2022 | 10.1109/CCGrid54584.2022.00049 | 2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid) |
Keywords | DocType | ISBN |
Convolutional Neural Network,Accelerator,Data Stream | Conference | 978-1-6654-9957-6 |
Citations | PageRank | References |
0 | 0.34 | 7 |
Authors | ||
5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Chunhua Xiao | 1 | 0 | 8.45 |
Chen Shi | 2 | 0 | 0.68 |
Dandan Xu | 3 | 0 | 1.35 |
Fangzhu Lin | 4 | 0 | 0.34 |
Kun Ning | 5 | 0 | 0.68 |