Title
Object-Agnostic Transformers for Video Referring Segmentation
Abstract
Video referring segmentation focuses on segmenting out the object in a video based on the corresponding textual description. Previous works have primarily tackled this task by devising two crucial parts, an intra-modal module for context modeling and an inter-modal module for heterogeneous alignment. However, there are two essential drawbacks of this approach: (1) it lacks joint learning of context modeling and heterogeneous alignment, leading to insufficient interactions among input elements; (2) both modules require task-specific expert knowledge to design, which severely limits the flexibility and generality of prior methods. To address these problems, we here propose a novel Object-Agnostic Transformer-based Network, called OATNet, that simultaneously conducts intra-modal and inter-modal learning for video referring segmentation, without the aid of object detection or category-specific pixel labeling. More specifically, we first directly feed the sequence of textual tokens and visual tokens (pixels rather than detected object bounding boxes) into a multi-modal encoder, where context and alignment are simultaneously and effectively explored. We then design a novel cascade segmentation network to decouple our task into coarse-grained segmentation and fine-grained refinement. Moreover, considering the difficulty of samples, a more balanced metric is provided to better diagnose the performance of the proposed method. Extensive experiments on two popular datasets, A2D Sentences and J-HMDB Sentences, demonstrate that our proposed approach noticeably outperforms state-of-the-art methods.
Year
DOI
Venue
2022
10.1109/TIP.2022.3161832
IEEE TRANSACTIONS ON IMAGE PROCESSING
Keywords
DocType
Volume
Task analysis, Visualization, Transformers, Feature extraction, Object detection, Image segmentation, Context modeling, Video referring segmentation, multi-modal learning, video grounding
Journal
31
Issue
ISSN
Citations 
1
1057-7149
0
PageRank 
References 
Authors
0.34
9
5
Name
Order
Citations
PageRank
Xu Yang1458.16
Hao Wang2184.34
De Xie391.84
Cheng Deng4128385.48
Dacheng Tao519032747.78