Abstract | ||
---|---|---|
BSTRACTSemantic segmentation constitutes the backbone of many mobile vision systems, spanning from robot navigation to augmented reality and teleconferencing. Frequently operating under stringent latency constraints within the limited resource envelope of embedded/mobile devices, optimising for efficient execution becomes important. To this end, we propose a framework for converting state-of-the-art segmentation models to MESS networks: specially trained CNNs that employ parametrised early exits along their depth. Upon deployment, the predictions of these exits can be exploited either in a dynamic (input-adaptive) way, to save computation during inference on easier samples; or in a static (device-adaptive) setting, to accommodate deployment under varying device capabilities without the need of retraining. Designing and training such networks naively can hurt performance. Thus, we propose a two-staged training process that pushes semantically important features early in the network. We co-optimise the number, placement and architecture of the attached segmentation heads, along with the exit policy, to adapt to the deployment scenario and application-specific requirements. Optimising for speed, MESS networks deliver latency gains of up to 2.65× over state-of-the-art methods with no accuracy degradation. Accordingly, optimising for accuracy, we achieve an improvement of up to 5.33 pp, under the same computational budget. |
Year | DOI | Venue |
---|---|---|
2022 | 10.1145/3498361.3538791 | Mobile Systems, Applications, and Services |
DocType | Citations | PageRank |
Conference | 0 | 0.34 |
References | Authors | |
0 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Alexandros Kouris | 1 | 0 | 0.34 |
Stylianos I. Venieris | 2 | 106 | 12.98 |
Stefanos Laskaridis | 3 | 0 | 0.34 |
Nicholas D. Lane | 4 | 4247 | 248.15 |