Title
Learning interactive multi-object segmentation through appearance embedding and spatial attention
Abstract
Deep learning approaches to interactive image segmentation are typically formulated as a binary labeling problem. A model trained to make predictions within a fixed set of labels (i.e., foreground and background labels) cannot be used to directly predict the binary masks of multiple objects of interest, which greatly limits its flexibility and adaptivity. The use of different classes of clicks as input is opted for and the first end-to-end learning model for multi-object segmentation, based on a new designed neural network, is developed. The network consists of a visual feature extractor, a recurrent attention module and a dynamic segmentation head, extracts user click-adapted appearance embedding features and spatial attention features, and then learns to transform this information into a segmentation of multiple objects. It is also proposed to train the network using a joint loss function, taking the embedding learning into account for segmentation. Comprehensive experiments are conducted on three benchmark datasets to demonstrate the effectiveness of the proposed method. It performs favorably against state-of-the-art approaches on the multiple object segmentation task, for example, with 0.15 s per image, 0.06 s per object and mean IoU & F1 score of 84.90% on Pascal VOC 2012 validation set. It is further shown that the method can be used in numerous vision applications such as image recoloring and colorization.
Year
DOI
Venue
2022
10.1049/ipr2.12520
IET IMAGE PROCESSING
DocType
Volume
Issue
Journal
16
10
ISSN
Citations 
PageRank 
1751-9659
0
0.34
References 
Authors
0
6
Name
Order
Citations
PageRank
Yan Gui100.34
Bingqiang Zhou200.68
Jianming Zhang325729.85
Cheng Sun400.34
Lingyun Xiang500.34
Jin Zhang600.34