Learning Disentangled Representation for Cross-Modal Retrieval with Deep Mutual Information Estimation - Citegraph

Paper Info

Title
Learning Disentangled Representation for Cross-Modal Retrieval with Deep Mutual Information Estimation

Abstract
Cross-modal retrieval has become a hot research topic in recent years for its theoretical and practical significance. This paper proposes a new technique for learning such deep visual-semantic embedding that is more effective and interpretable for cross-modal retrieval. The proposed method employs a two-stage strategy to fulfill the task. In the first stage, deep mutual information estimation is incorporated into the objective to maximize the mutual information between the input data and its embedding. In the second stage, an expelling branch is added to the network to disentangle the modality-exclusive information from the learned representations. This helps to reduce the impact of modality-exclusive information to the common subspace representation as well as improve the interpretability of the learned feature. Extensive experiments on two large-scale benchmark datasets demonstrate that our method can learn better visual-semantic embedding and achieve state-of-the-art cross-modal retrieval results.

Year	DOI	Venue
2019	10.1145/3343031.3351053	Proceedings of the 27th ACM International Conference on Multimedia
Keywords	Field	DocType
cross-modal retrieval, disentangled representation learning, mutual information estimation	Computer vision,Interpretability,Embedding,Subspace topology,Computer science,Artificial intelligence,Mutual information,Modal,Machine learning	Conference
ISBN	Citations	PageRank
978-1-4503-6889-6	0	0.34
References	Authors
0	4

Authors (4 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Weikuo Guo	1	0	0.68
Huang, Huaibo	2	29	10.81
Xiang-Wei Kong	3	212	15.09
Ran He	4	1790	108.39

1