Abstract | ||
---|---|---|
Field Programmable Gate Array (FPGA) platform has been a popular choice for deploying Convolutional Neural Networks (CNNs) as a result of its high parallelism and low energy consumption. Due to the limitation of on-chip resources on a single board, FPGA clusters become promising solutions to improve the throughput of CNNs. In this paper, we firstly put forward strategies to optimize the resource allocation intra and inter FPGA boards. Then we model the multi-board cluster problem and design algorithms based on knapsack problem and dynamic programming to calculate the optimal topology of the FPGA clusters. We also give a quantitative analysis of the inter-board data transmission bandwidth requirement. To make our design accommodate for more situations, we provide solutions for deploying fully connected layers and special convolution layers with large memory requirement. Experimental results show that typical well-known CNNs with the proposed topology of FPGA clusters could obtain a higher throughput per board than single-board solutions and other multi-board solutions.
|
Year | DOI | Venue |
---|---|---|
2020 | 10.1145/3373087.3375366 | FPGA |
Field | DocType | ISBN |
Cluster (physics),Computer science,Parallel computing,Field-programmable gate array,Throughput | Conference | 978-1-4503-7099-8 |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ruihao Li | 1 | 40 | 9.88 |
Ke Liu | 2 | 20 | 16.97 |
Mengying Zhao | 3 | 81 | 8.31 |
Zhaoyan Shen | 4 | 14 | 7.94 |
Xiaojun Cai | 5 | 6 | 4.17 |
zhiping jia | 6 | 463 | 60.64 |