Abstract | ||
---|---|---|
Video referring segmentation focuses on segmenting out the object in a video based on the corresponding textual description. Previous works have primarily tackled this task by devising two crucial parts, an intra-modal module for context modeling and an inter-modal module for heterogeneous alignment. However, there are two essential drawbacks of this approach: (1) it lacks joint learning of context modeling and heterogeneous alignment, leading to insufficient interactions among input elements; (2) both modules require task-specific expert knowledge to design, which severely limits the flexibility and generality of prior methods. To address these problems, we here propose a novel Object-Agnostic Transformer-based Network, called OATNet, that simultaneously conducts intra-modal and inter-modal learning for video referring segmentation, without the aid of object detection or category-specific pixel labeling. More specifically, we first directly feed the sequence of textual tokens and visual tokens (pixels rather than detected object bounding boxes) into a multi-modal encoder, where context and alignment are simultaneously and effectively explored. We then design a novel cascade segmentation network to decouple our task into coarse-grained segmentation and fine-grained refinement. Moreover, considering the difficulty of samples, a more balanced metric is provided to better diagnose the performance of the proposed method. Extensive experiments on two popular datasets, A2D Sentences and J-HMDB Sentences, demonstrate that our proposed approach noticeably outperforms state-of-the-art methods. |
Year | DOI | Venue |
---|---|---|
2022 | 10.1109/TIP.2022.3161832 | IEEE TRANSACTIONS ON IMAGE PROCESSING |
Keywords | DocType | Volume |
Task analysis, Visualization, Transformers, Feature extraction, Object detection, Image segmentation, Context modeling, Video referring segmentation, multi-modal learning, video grounding | Journal | 31 |
Issue | ISSN | Citations |
1 | 1057-7149 | 0 |
PageRank | References | Authors |
0.34 | 9 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Xu Yang | 1 | 45 | 8.16 |
Hao Wang | 2 | 18 | 4.34 |
De Xie | 3 | 9 | 1.84 |
Cheng Deng | 4 | 1283 | 85.48 |
Dacheng Tao | 5 | 19032 | 747.78 |