Abstract | ||
---|---|---|
Instance-level video segmentation requires a solid integration of spatial and temporal information. However, current methods rely mostly on domain-specific information (online learning) to produce accurate instance-level segmentations. We propose a novel approach that relies exclusively on the integration of generic spatio-temporal attention cues. Our strategy, named Multi-Attention Instance Network (MAIN), overcomes challenging segmentation scenarios over arbitrary videos without modeling sequence- or instance-specific knowledge. We design MAIN to segment multiple instances in a single forward pass, and optimize it with a novel loss function that favors class agnostic predictions and assigns instance-specific penalties. We achieve state-of-the-art performance on the challenging Youtube-VOS dataset and benchmark, improving the unseen Jaccard and F-Metric by 6.8% and 12.7% respectively, while operating at real-time (30.3 FPS). |
Year | DOI | Venue |
---|---|---|
2021 | 10.1016/j.cviu.2021.103240 | Computer Vision and Image Understanding |
Keywords | DocType | Volume |
65D19,68T45,68T10 | Journal | 210 |
Issue | ISSN | Citations |
1 | 1077-3142 | 0 |
PageRank | References | Authors |
0.34 | 0 | 7 |
Name | Order | Citations | PageRank |
---|---|---|---|
Juan Carlos León | 1 | 1 | 3.12 |
María Alejandra Bravo | 2 | 0 | 0.34 |
Guillaume Jeanneret | 3 | 0 | 1.35 |
Ali K. Thabet | 4 | 19 | 7.10 |
Thomas Brox | 5 | 7866 | 327.52 |
Pablo Arbelaez | 6 | 3626 | 173.00 |
Bernard Ghanem | 7 | 1487 | 81.44 |