Abstract | ||
---|---|---|
Prior work in multi-task learning has mainly focused on predictions on a single image. In this work, we present a new approach for multi-task learning from videos via efficient inter-frame local attention (MILA). Our approach contains a novel inter-frame attention module which allows learning of task-specific attention across frames. We embed the attention module in a "slow-fast" architecture, where the slow network runs on sparsely sampled keyframes and the fast shallow network runs on non-keyframes at a high frame rate. We also propose an effective adversarial learning strategy to encourage the slow and fast network to learn similar features to well align keyframes and non-keyframes. Our approach ensures low-latency multi-task learning while maintaining high quality predictions. MILA obatins competitive accuracy compared to state-ofthe-art on two multi-task learning benchmarks while reducing the number of floating point operations (FLOPs) by up to 70%. In addition, our attention based feature propagation method (IIA) outperforms prior work in terms of task accuracy while also reducing up to 90% of FLOPs. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1109/ICCVW54120.2021.00251 | 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021) |
Keywords | DocType | Volume |
n/a | Conference | 2021 |
Issue | ISSN | Citations |
1 | 2473-9936 | 0 |
PageRank | References | Authors |
0.34 | 0 | 8 |
Name | Order | Citations | PageRank |
---|---|---|---|
Donghyun Kim | 1 | 0 | 1.69 |
Tian Lan | 2 | 0 | 0.34 |
Chuhang Zou | 3 | 0 | 0.68 |
Ning Xu | 4 | 0 | 0.34 |
Bryan A. Plummer | 5 | 76 | 8.15 |
Stan Sclaroff | 6 | 5631 | 705.89 |
Jayan Eledath | 7 | 1 | 1.02 |
Gérard G. Medioni | 8 | 0 | 0.68 |