Cross-Media Image-Text Retrieval Combined with Global Similarity and Local Similarity - Citegraph

Paper Info

Title
Cross-Media Image-Text Retrieval Combined with Global Similarity and Local Similarity

Abstract
In this paper, we study the problem of image-text matching in order to make the image and text have better semantic matching. In the previous work, people just simply used the pre-training network to extract image and text features and project directly into a common subspace, or change various loss functions on this basis, or use the attention mechanism to directly match the image region proposals and the text phrases. This is not a good match for the semantics of the image and the text. In this study, we propose a method of cross-media retrieval based on global representation and local representation. We constructed a cross-media two-level network to explore better semantic matching between images and text, which contains subnets that handle both global and local features. Specifically, we not only use the self-attention network to obtain a macro representation of the global image but also use the local fine-grained patch with the attention mechanism. Then, we use a two-level alignment framework to promote each other to learn different representations of cross-media retrieval. The innovation of this study lies in the use of more comprehensive features of image and text to design the two kinds of similarity and add them up in some way. Experimental results show that this method is effective in image-text retrieval. Experimental results on the Flickr30K and MS-COCO datasets show that this model has a better recall rate than many of the current advanced cross-media retrieval models.

Year	DOI	Venue
2019	10.1109/DSAA.2019.00029	2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)
Keywords	Field	DocType
convolutional neural network,self-attention network,attention mechanism,two-level network,cross-media retrieval	Information retrieval,Subspace topology,Recall rate,Convolutional neural network,Computer science,Cross media,Macro,Text retrieval,Semantics,Semantic matching	Conference
ISSN	ISBN	Citations
2472-1573	978-1-7281-4494-8	0
PageRank	References	Authors
0.34	9	3

Authors (3 rows)

Cited by (0 rows)

References (9 rows)

Name	Order	Citations	PageRank
Zhixin Li	1	12	19.62
Feng Ling	2	1	1.41
Canlong Zhang	3	5	8.55

1