MAIN: Multi-Attention Instance Network for video segmentation - Citegraph

Paper Info

Title
MAIN: Multi-Attention Instance Network for video segmentation

Abstract
Instance-level video segmentation requires a solid integration of spatial and temporal information. However, current methods rely mostly on domain-specific information (online learning) to produce accurate instance-level segmentations. We propose a novel approach that relies exclusively on the integration of generic spatio-temporal attention cues. Our strategy, named Multi-Attention Instance Network (MAIN), overcomes challenging segmentation scenarios over arbitrary videos without modeling sequence- or instance-specific knowledge. We design MAIN to segment multiple instances in a single forward pass, and optimize it with a novel loss function that favors class agnostic predictions and assigns instance-specific penalties. We achieve state-of-the-art performance on the challenging Youtube-VOS dataset and benchmark, improving the unseen Jaccard and F-Metric by 6.8% and 12.7% respectively, while operating at real-time (30.3 FPS).

Year	DOI	Venue
2021	10.1016/j.cviu.2021.103240	Computer Vision and Image Understanding
Keywords	DocType	Volume
65D19,68T45,68T10	Journal	210
Issue	ISSN	Citations
1	1077-3142	0
PageRank	References	Authors
0.34	0	7

Authors (7 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Juan Carlos León	1	1	3.12
María Alejandra Bravo	2	0	0.34
Guillaume Jeanneret	3	0	1.35
Ali K. Thabet	4	19	7.10
Thomas Brox	5	7866	327.52
Pablo Arbelaez	6	3626	173.00
Bernard Ghanem	7	1487	81.44

1