Title
Deep Top- $k$ Ranking for Image–Sentence Matching
Abstract
Image–sentence matching is a challenging task for the heterogeneity-gap between different modalities. Ranking-based methods have achieved excellent performance in this task in past decades. Given an image query, these methods typically assume that the correct matched image–sentence pair must rank before all other mismatched ones. However, this assumption may be too strict and prone to the overfitting problem, especially when some sentences in a massive database are similar and confusable with one another. In this paper, we relax the traditional ranking loss and propose a novel deep multi-modal network with a top- <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$k$</tex-math></inline-formula> ranking loss to mitigate the data ambiguity problem. With this strategy, query results will not be penalized unless the index of ground truth is outside the range of top- <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$k$</tex-math></inline-formula> query results. Considering the non-smoothness and non-convexity of the initial top- <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$k$</tex-math></inline-formula> ranking loss, we exploit a tight convex upper bound to approximate the loss and then utilize the traditional back-propagation algorithm to optimize the deep multi-modal network. Finally, we apply the method on three benchmark datasets, namely, Flickr8k, Flickr30k, and MSCOCO. Empirical results on metrics R@K (K = 1, 5, 10) show that our method achieves comparable performance in comparison to state-of-the-art methods.
Year
DOI
Venue
2020
10.1109/TMM.2019.2931352
IEEE Transactions on Multimedia
Keywords
Field
DocType
Task analysis,Bidirectional control,Databases,Training,Deep learning,Sports,Semantics
Pattern recognition,Ranking,Computer science,Natural language processing,Artificial intelligence,Sentence
Journal
Volume
Issue
ISSN
22
3
1520-9210
Citations 
PageRank 
References 
5
0.38
0
Authors
6
Name
Order
Citations
PageRank
Lingling Zhang127645.79
Minnan Luo226921.18
Jun Liu317825.96
Xiaojun Chang4158576.85
Yi Yang56873271.72
Alexander G. Hauptmann67472558.23