Abstract | ||
---|---|---|
Image-text matching aims to find the relationship between image and text data and to establish a connection between them. The main challenge of image-text matching is the fact that images and texts have different data distributions and feature representations. Current methods for image-text matching fall into two basic types: methods that map image and text data into a common space and then use distance measurements and methods that treat image-text matching as a classification problem. In both cases, the two data modes used are image and text data. In our method, we create a fusion layer to extract intermediate modes, thus improving the image-text processing results. We also propose a concise way to update the loss function that makes it easier for neural networks to handle difficult problems. The proposed method was verified on the Flickr30K and MS-COCO datasets and achieved superior matching results compared to existing methods. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1016/j.neucom.2021.01.124 | Neurocomputing |
Keywords | DocType | Volume |
Deep learning,Image-text matching,Multimodal,Retrieval | Journal | 442 |
ISSN | Citations | PageRank |
0925-2312 | 2 | 0.42 |
References | Authors | |
0 | 8 |
Name | Order | Citations | PageRank |
---|---|---|---|
Depeng Wang | 1 | 2 | 0.42 |
Liejun Wang | 2 | 7 | 2.86 |
Shiji Song | 3 | 1247 | 94.76 |
Gao Huang | 4 | 875 | 53.36 |
Yuchen Guo | 5 | 710 | 35.96 |
Shuli Cheng | 6 | 6 | 7.59 |
Naixiang Ao | 7 | 2 | 0.42 |
Anyu Du | 8 | 4 | 4.19 |