Title
MAIN: Multi-Attention Instance Network for video segmentation
Abstract
Instance-level video segmentation requires a solid integration of spatial and temporal information. However, current methods rely mostly on domain-specific information (online learning) to produce accurate instance-level segmentations. We propose a novel approach that relies exclusively on the integration of generic spatio-temporal attention cues. Our strategy, named Multi-Attention Instance Network (MAIN), overcomes challenging segmentation scenarios over arbitrary videos without modeling sequence- or instance-specific knowledge. We design MAIN to segment multiple instances in a single forward pass, and optimize it with a novel loss function that favors class agnostic predictions and assigns instance-specific penalties. We achieve state-of-the-art performance on the challenging Youtube-VOS dataset and benchmark, improving the unseen Jaccard and F-Metric by 6.8% and 12.7% respectively, while operating at real-time (30.3 FPS).
Year
DOI
Venue
2021
10.1016/j.cviu.2021.103240
Computer Vision and Image Understanding
Keywords
DocType
Volume
65D19,68T45,68T10
Journal
210
Issue
ISSN
Citations 
1
1077-3142
0
PageRank 
References 
Authors
0.34
0
7
Name
Order
Citations
PageRank
Juan Carlos León113.12
María Alejandra Bravo200.34
Guillaume Jeanneret301.35
Ali K. Thabet4197.10
Thomas Brox57866327.52
Pablo Arbelaez63626173.00
Bernard Ghanem7148781.44