CMF - Cascaded Multi-Model Fusion For Referring Image Segmentation. - Citegraph

Paper Info

Title
CMF - Cascaded Multi-Model Fusion For Referring Image Segmentation.

Abstract
In this work, we address the task of referring image segmentation (RIS), which aims at predicting a segmentation mask for the object described by a natural language expression. Most existing methods focus on establishing unidirectional or directional relationships between visual and linguistic features to associate two modalities together, while the multi-scale context is ignored or insufficiently modeled. Multi-scale context is crucial to localize and segment those objects that have large scale variations during the multi-modal fusion process. To solve this problem, we propose a simple yet effective Cascaded Multi-modal Fusion (CMF) module, which stacks multiple atrous convolutional layers in parallel and further introduces a cascaded branch to fuse visual and linguistic features. The cascaded branch can progressively integrate multi-scale contextual information and facilitate the alignment of two modalities during the multi-modal fusion process. Experimental results on four benchmark datasets demonstrate that our method outperforms most state-of-the-art methods. Code is available at https://github.com/jianhua2022/CMF-Refseg.

Year	DOI	Venue
2021	10.1109/ICIP42928.2021.9506483	ICIP
DocType	Citations	PageRank
Conference	0	0.34
References	Authors
0	4

Authors (4 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Jianhua Yang	1	80	12.95
Yan Huang	2	226	27.65
Zhanyu Ma	3	539	55.74
Liang Wang	4	4317	243.28

1