Title | ||
---|---|---|
Learning Disentangled Representation for Cross-Modal Retrieval with Deep Mutual Information Estimation |
Abstract | ||
---|---|---|
Cross-modal retrieval has become a hot research topic in recent years for its theoretical and practical significance. This paper proposes a new technique for learning such deep visual-semantic embedding that is more effective and interpretable for cross-modal retrieval. The proposed method employs a two-stage strategy to fulfill the task. In the first stage, deep mutual information estimation is incorporated into the objective to maximize the mutual information between the input data and its embedding. In the second stage, an expelling branch is added to the network to disentangle the modality-exclusive information from the learned representations. This helps to reduce the impact of modality-exclusive information to the common subspace representation as well as improve the interpretability of the learned feature. Extensive experiments on two large-scale benchmark datasets demonstrate that our method can learn better visual-semantic embedding and achieve state-of-the-art cross-modal retrieval results.
|
Year | DOI | Venue |
---|---|---|
2019 | 10.1145/3343031.3351053 | Proceedings of the 27th ACM International Conference on Multimedia |
Keywords | Field | DocType |
cross-modal retrieval, disentangled representation learning, mutual information estimation | Computer vision,Interpretability,Embedding,Subspace topology,Computer science,Artificial intelligence,Mutual information,Modal,Machine learning | Conference |
ISBN | Citations | PageRank |
978-1-4503-6889-6 | 0 | 0.34 |
References | Authors | |
0 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Weikuo Guo | 1 | 0 | 0.68 |
Huang, Huaibo | 2 | 29 | 10.81 |
Xiang-Wei Kong | 3 | 212 | 15.09 |
Ran He | 4 | 1790 | 108.39 |