Deep Top- $k$ Ranking for Image–Sentence Matching - Citegraph

Paper Info

Title
Deep Top- $k$ Ranking for Image–Sentence Matching

Abstract
Image–sentence matching is a challenging task for the heterogeneity-gap between different modalities. Ranking-based methods have achieved excellent performance in this task in past decades. Given an image query, these methods typically assume that the correct matched image–sentence pair must rank before all other mismatched ones. However, this assumption may be too strict and prone to the overfitting problem, especially when some sentences in a massive database are similar and confusable with one another. In this paper, we relax the traditional ranking loss and propose a novel deep multi-modal network with a top- <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$k$</tex-math></inline-formula> ranking loss to mitigate the data ambiguity problem. With this strategy, query results will not be penalized unless the index of ground truth is outside the range of top- <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$k$</tex-math></inline-formula> query results. Considering the non-smoothness and non-convexity of the initial top- <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$k$</tex-math></inline-formula> ranking loss, we exploit a tight convex upper bound to approximate the loss and then utilize the traditional back-propagation algorithm to optimize the deep multi-modal network. Finally, we apply the method on three benchmark datasets, namely, Flickr8k, Flickr30k, and MSCOCO. Empirical results on metrics R@K (K = 1, 5, 10) show that our method achieves comparable performance in comparison to state-of-the-art methods.

Year	DOI	Venue
2020	10.1109/TMM.2019.2931352	IEEE Transactions on Multimedia
Keywords	Field	DocType
Task analysis,Bidirectional control,Databases,Training,Deep learning,Sports,Semantics	Pattern recognition,Ranking,Computer science,Natural language processing,Artificial intelligence,Sentence	Journal
Volume	Issue	ISSN
22	3	1520-9210
Citations	PageRank	References
5	0.38	0
Authors
6

Authors (6 rows)

Cited by (5 rows)

References (0 rows)

Name	Order	Citations	PageRank
Lingling Zhang	1	276	45.79
Minnan Luo	2	269	21.18
Jun Liu	3	178	25.96
Xiaojun Chang	4	1585	76.85
Yi Yang	5	6873	271.72
Alexander G. Hauptmann	6	7472	558.23

1