Fine-Grained Correlation Learning with Stacked Co-attention Networks for Cross-Modal Information Retrieval. - Citegraph

Paper Info

Title
Fine-Grained Correlation Learning with Stacked Co-attention Networks for Cross-Modal Information Retrieval.

Abstract
Cross-modal retrieval provides a flexible way to find semantically relevant information across different modalities given a query of one modality. The main challenge is to measure the similarity between different modalities of data. Generally, different modalities contain unequal amount of information when describing the same semantics. For example, textual descriptions often contain more background information that cannot be conveyed by images and vice versa. Existing works mostly map the global data features from different modalities to a common semantic space to measure their similarity, which ignore their imbalanced and complementary relationships. In this paper, we propose stacked co-attention networks (SCANet) to progressively learn the mutually attended features of different modalities and leverage these fine-grained correlations to enhance cross-modal retrieval performance. SCANet adopts a dual-path end-to-end framework to jointly learn the multimodal representations, stacked co-attention, and similarity metric. Experiment results on three widely-used benchmark datasets verify that SCANet outperforms state-of-the-art methods, with 19% improvements on MAP in average for the best case.

Year	Venue	Field
2018	KSEM	Modalities,Information retrieval,Computer science,Correlation,Versa,Modal,Semantics,Semantic space
DocType	Citations	PageRank
Conference	0	0.34
References	Authors
16	6

Authors (6 rows)

Cited by (0 rows)

References (16 rows)

Name	Order	Citations	PageRank
Yuhang Lu	1	17	4.62
Jing Yu	2	123	20.30
Yanbing Liu	3	19	12.33
Jianlong Tan	4	132	22.14
Li Guo	5	224	16.28
Weifeng Zhang	6	29	8.24

1