Title
DRSL: Deep Relational Similarity Learning for Cross-modal Retrieval
Abstract
Cross-modal retrieval aims to retrieve relevant samples across different media modalities. Existing cross-modal retrieval approaches are contingent on learning common representations of all modalities by assuming that an equal amount of information exists in different modalities. However, since the quantity of information among cross-modal samples is unbalanced and unequal, it is inappropriate to directly match the obtained modality-specific representations across different modalities in a common space. In this paper, we propose a new method called Deep Relational Similarity Learning (DRSL) for cross-modal retrieval. Unlike existing approaches, the proposed DRSL aims to effectively bridge the heterogeneity gap of different modalities by directly learning the natural pairwise similarities instead of explicitly learning a common space. DRSL is a deep hybrid framework that integrates the relation networks module for relation learning, capturing the implicit nonlinear distance metric. To the best of our knowledge, DRSL is the first approach that incorporates relation networks into the cross-modal learning scenario. Comprehensive experimental results show that the proposed DRSL model achieves state-of-the-art results in cross-modal retrieval tasks on four widely-used benchmark datasets, i.e., Wikipedia, Pascal Sentences, NUS-WIDE-10K, and XMediaNet.
Year
DOI
Venue
2021
10.1016/j.ins.2020.08.009
Information Sciences
Keywords
DocType
Volume
Cross-modal retrieval,Relation network,Relational similarity learning,Heterogeneity gap
Journal
546
ISSN
Citations 
PageRank 
0020-0255
1
0.37
References 
Authors
0
4
Name
Order
Citations
PageRank
Xu Wang1211.97
Peng Hu2719.06
Liangli Zhen3729.73
Dezhong Peng428527.92