Abstract | ||
---|---|---|
The predictive power of Convolutional Neural Networks (CNNs) has been an integral factor for emerging latency-sensitive applications, such as autonomous drones and vehicles. Such systems employ multiple CNNs, each one trained for a particular task. The efficient mapping of multiple CNNs on a single FPGA device is a challenging task as the allocation of compute resources and external memory bandwidth needs to be optimised at design time. This paper proposes f-CNNx, an automated toolflow for the optimised mapping of multiple CNNs on FPGAs, comprising a novel multi-CNN hardware architecture together with an automated design space exploration method that considers the user-specified performance requirements for each model to allocate compute resources and generate a synthesisable accelerator. Moreover, f-CNNx employs a novel scheduling algorithm that alleviates the limitations of the memory bandwidth contention between CNNs and sustains the high utilisation of the architecture. Experimental evaluation shows that f-CNNx's designs outperform contention-unaware FPGA mappings by up to 50% and deliver up to 6.8x higher performance-per-Watt over highly optimised GPU designs for multi-CNN systems. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1109/FPL.2018.00072 | 2018 28th International Conference on Field Programmable Logic and Applications (FPL) |
Keywords | DocType | Volume |
Convolutional Neural Networks,Multi CNN Systems,FPGAs,Design space exploration,Multiple CNNs,Latency sensitive deep learning | Conference | abs/1805.10174 |
ISSN | ISBN | Citations |
1946-147X | 978-1-5386-8518-1 | 0 |
PageRank | References | Authors |
0.34 | 11 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Stylianos I. Venieris | 1 | 106 | 12.98 |
Christos Savvas Bouganis | 2 | 400 | 49.04 |