Abstract | ||
---|---|---|
Cross-modal retrieval aims at searching semantically similar examples in one modality by using a query from another modality. Its typical applications including image-based text retrieval (IBTR) and text-based image retrieval (TBIR). Due to the rapid growth of multimodal data and the success of deep learning, cross-modal retrieval has received increasing attention and achieved significant progress in recent years. Dual-path CNN is a novel framework in this domain, which yields competitive performance by utilizing instance loss and inter-modal loss. However, it is still less discriminative in modeling the intra-modal relationship, which is also important in bridging a more discriminative cross-modal embedding network. To this end, we propose to incorporate an additional intra-modal loss into the framework to remedy this problem by preserving the intra-modal structure. Further, we develop a novel batch flexible sampling approach to train the entire network effectively and efficiently. Our approach, named Discriminative Dual-Path CNN (DDPC), achieves the state-of-the-art results on the MS-COCO dataset, improving IBTR by 4.9% and TBIR by 5.9% based on Recall@1 on the 5K test set. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1007/978-3-030-00764-5_35 | ADVANCES IN MULTIMEDIA INFORMATION PROCESSING, PT III |
Keywords | Field | DocType |
Cross-modal retrieval,Convolutional neural network,Intra-modal,Inter-modal,Embedding space | Embedding,Pattern recognition,Convolutional neural network,Computer science,Bridging (networking),Image retrieval,Artificial intelligence,Deep learning,Discriminative model,Modal,Test set | Conference |
Volume | ISSN | Citations |
11166 | 0302-9743 | 0 |
PageRank | References | Authors |
0.34 | 22 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
haoran wang | 1 | 81 | 6.77 |
Zhong Ji | 2 | 169 | 23.08 |
Yanwei Pang | 3 | 1798 | 91.55 |