Abstract | ||
---|---|---|
Several recent works seek to create lightweight deep net-works for video object detection on mobiles. We observe that many existing detectors, previously deemed computationally costly for mobiles, intrinsically support adaptive inference, and offer a multi-branch object detection frame-work (MBODF). Here, an MBODF is referred to as a so-lution that has many execution branches and one can dy-namically choose from among them at inference time to sat-isfy varying latency requirements (e.g. by varying resolution of an input frame). In this paper, we ask, and answer, the wide-ranging question across all MBODFs: How to expose the right set of execution branches and then how to sched-ule the optimal one at inference time? In addition, we un-cover the importance of making a content-aware decision on which branch to run, as the optimal one is conditioned on the video content. Finally, we explore a content-aware scheduler, an Oracle one, and then a practical one, leveraging various lightweight feature extractors. Our evaluation shows that layered on Faster R-CNN-based MBODF, compared to 7 baselines, our Smartadapt achieves a higher Pareto optimal curve in the accuracy-vs-latency space for the ILSVRC VID dataset. |
Year | DOI | Venue |
---|---|---|
2022 | 10.1109/CVPR52688.2022.00256 | IEEE Conference on Computer Vision and Pattern Recognition |
Keywords | DocType | Volume |
Vision applications and systems, Efficient learning and inferences, Machine learning, Motion and tracking, Recognition: detection,categorization,retrieval | Conference | 2022 |
Issue | Citations | PageRank |
1 | 0 | 0.34 |
References | Authors | |
0 | 7 |
Name | Order | Citations | PageRank |
---|---|---|---|
Xu Ran | 1 | 3 | 0.79 |
Fangzhou Mu | 2 | 0 | 0.34 |
Jayoung Lee | 3 | 5 | 1.46 |
Preeti Mukherjee | 4 | 0 | 0.34 |
Somali Chaterji | 5 | 36 | 9.75 |
Saurabh Bagchi | 6 | 2022 | 144.72 |
Yin Li | 7 | 797 | 35.85 |