Title
Fusion layer attention for image-text matching
Abstract
Image-text matching aims to find the relationship between image and text data and to establish a connection between them. The main challenge of image-text matching is the fact that images and texts have different data distributions and feature representations. Current methods for image-text matching fall into two basic types: methods that map image and text data into a common space and then use distance measurements and methods that treat image-text matching as a classification problem. In both cases, the two data modes used are image and text data. In our method, we create a fusion layer to extract intermediate modes, thus improving the image-text processing results. We also propose a concise way to update the loss function that makes it easier for neural networks to handle difficult problems. The proposed method was verified on the Flickr30K and MS-COCO datasets and achieved superior matching results compared to existing methods.
Year
DOI
Venue
2021
10.1016/j.neucom.2021.01.124
Neurocomputing
Keywords
DocType
Volume
Deep learning,Image-text matching,Multimodal,Retrieval
Journal
442
ISSN
Citations 
PageRank 
0925-2312
2
0.42
References 
Authors
0
8
Name
Order
Citations
PageRank
Depeng Wang120.42
Liejun Wang272.86
Shiji Song3124794.76
Gao Huang487553.36
Yuchen Guo571035.96
Shuli Cheng667.59
Naixiang Ao720.42
Anyu Du844.19