Title | ||
---|---|---|
Video Instance Segmentation via Multi-Scale Spatio-Temporal Split Attention Transformer. |
Abstract | ||
---|---|---|
State-of-the-art transformer-based video instance segmentation (VIS) approaches typically utilize either single-scale spatio-temporal features or per-frame multi-scale features during the attention computations. We argue that such an attention computation ignores the multi-scale spatio-temporal feature relationships that are crucial to tackle target appearance deformations in videos. To address this issue, we propose a transformer-based VIS framework, named MS-STS VIS, that comprises a novel multi-scale spatio-temporal split (MS-STS) attention module in the encoder. The proposed MS-STS module effectively captures spatio-temporal feature relationships at multiple scales across frames in a video. We further introduce an attention block in the decoder to enhance the temporal consistency of the detected instances in different frames of a video. Moreover, an auxiliary discriminator is introduced during training to ensure better foreground-background separability within the multi-scale spatio-temporal feature space. We conduct extensive experiments on two benchmarks: Youtube-VIS (2019 and 2021). Our MS-STS VIS achieves state-of-the-art performance on both benchmarks. When using the ResNet50 backbone, our MS-STS achieves a mask AP of 50.1%, outperforming the best reported results in literature by 2.7% and by 4.8% at higher overlap threshold of \(\text {AP}_{\texttt{75}}\), while being comparable in model size and speed on Youtube-VIS 2019 val. set. When using the Swin Transformer backbone, MS-STS VIS achieves mask AP of 61.0% on Youtube-VIS 2019 val. set. Source code is available at https://github.com/OmkarThawakar/MSSTS-VIS. |
Year | DOI | Venue |
---|---|---|
2022 | 10.1007/978-3-031-19818-2_38 | European Conference on Computer Vision |
DocType | Citations | PageRank |
Conference | 0 | 0.34 |
References | Authors | |
0 | 9 |
Name | Order | Citations | PageRank |
---|---|---|---|
Omkar Thawakar | 1 | 0 | 0.34 |
Sanath Narayan | 2 | 0 | 1.35 |
Jiale Cao | 3 | 98 | 8.46 |
Hisham Cholakkal | 4 | 48 | 8.40 |
Muhammad Anwer Rao | 5 | 129 | 11.22 |
Khan Muhammad Haris | 6 | 184 | 6.25 |
Salman Khan | 7 | 387 | 41.05 |
Michael Felsberg | 8 | 2419 | 130.29 |
Fahad Shahbaz Khan | 9 | 1622 | 69.24 |