Title
Weakly supervised video object segmentation initialized with referring expression
Abstract
AbstractAbstractWith the aid of one manually annotated frame, One-Shot Video Object Segmentation (OSVOS) uses a CNN architecture to tackle the problem of semi-supervised video object segmentation (VOS). However, annotating a pixel-level segmentation mask is expensive and time-consuming. To alleviate the problem, we explore a language interactive way of initializing semi-supervised VOS and run the semi-supervised methods into a weakly supervised mode. Our contributions are two folds: (i) we propose a variant of OSVOS initialized with referring expressions (REVOS), which locates a target object by maximizing the matching score between all the candidates and the referring expression; (ii) segmentation performance of semi-supervised VOS methods varies dramatically when selecting different frames for annotation. We present a strategy of the best annotation frame selection by using image similarity measurement. Meanwhile, we first to propose a multiple frame annotation selection strategy for initialization of semi-supervised VOS with more than one annotated frames. Finally we evaluate our method on DAVIS-2016 dataset, and experimental results show that REVOS achieves similar performance (79.94% measured by average IoU) compared with OSVOS (80.1%). Although current REVOS implementation is specific to the method of one-shot video object segmentation, it can be more widely applicable to other semi-supervised VOS methods.
Year
DOI
Venue
2021
10.1016/j.neucom.2020.06.129
Periodicals
Keywords
DocType
Volume
Video Object Segmentation, Referring Expression, Natural Language Processing
Journal
453
Issue
ISSN
Citations 
C
0925-2312
0
PageRank 
References 
Authors
0.34
0
7
Name
Order
Citations
PageRank
XiaoQing Bu100.34
Yukuan Sun201.69
Jianming Wang3154.60
Kunliang Liu401.69
Jiayu Liang523.40
GuangHao Jin600.34
Tae-Sun Chung780870.33