Title
Learning Disentangled Representation for Cross-Modal Retrieval with Deep Mutual Information Estimation
Abstract
Cross-modal retrieval has become a hot research topic in recent years for its theoretical and practical significance. This paper proposes a new technique for learning such deep visual-semantic embedding that is more effective and interpretable for cross-modal retrieval. The proposed method employs a two-stage strategy to fulfill the task. In the first stage, deep mutual information estimation is incorporated into the objective to maximize the mutual information between the input data and its embedding. In the second stage, an expelling branch is added to the network to disentangle the modality-exclusive information from the learned representations. This helps to reduce the impact of modality-exclusive information to the common subspace representation as well as improve the interpretability of the learned feature. Extensive experiments on two large-scale benchmark datasets demonstrate that our method can learn better visual-semantic embedding and achieve state-of-the-art cross-modal retrieval results.
Year
DOI
Venue
2019
10.1145/3343031.3351053
Proceedings of the 27th ACM International Conference on Multimedia
Keywords
Field
DocType
cross-modal retrieval, disentangled representation learning, mutual information estimation
Computer vision,Interpretability,Embedding,Subspace topology,Computer science,Artificial intelligence,Mutual information,Modal,Machine learning
Conference
ISBN
Citations 
PageRank 
978-1-4503-6889-6
0
0.34
References 
Authors
0
4
Name
Order
Citations
PageRank
Weikuo Guo100.68
Huang, Huaibo22910.81
Xiang-Wei Kong321215.09
Ran He41790108.39