Abstract | ||
---|---|---|
Image–sentence matching is a challenging task for the heterogeneity-gap between different modalities. Ranking-based methods have achieved excellent performance in this task in past decades. Given an image query, these methods typically assume that the correct matched image–sentence pair must rank before all other mismatched ones. However, this assumption may be too strict and prone to the overfitting problem, especially when some sentences in a massive database are similar and confusable with one another. In this paper, we relax the traditional ranking loss and propose a novel deep multi-modal network with a top-
<inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$k$</tex-math></inline-formula>
ranking loss to mitigate the data ambiguity problem. With this strategy, query results will not be penalized unless the index of ground truth is outside the range of top-
<inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$k$</tex-math></inline-formula>
query results. Considering the non-smoothness and non-convexity of the initial top-
<inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$k$</tex-math></inline-formula>
ranking loss, we exploit a tight convex upper bound to approximate the loss and then utilize the traditional back-propagation algorithm to optimize the deep multi-modal network. Finally, we apply the method on three benchmark datasets, namely, Flickr8k, Flickr30k, and MSCOCO. Empirical results on metrics R@K (K = 1, 5, 10) show that our method achieves comparable performance in comparison to state-of-the-art methods. |
Year | DOI | Venue |
---|---|---|
2020 | 10.1109/TMM.2019.2931352 | IEEE Transactions on Multimedia |
Keywords | Field | DocType |
Task analysis,Bidirectional control,Databases,Training,Deep learning,Sports,Semantics | Pattern recognition,Ranking,Computer science,Natural language processing,Artificial intelligence,Sentence | Journal |
Volume | Issue | ISSN |
22 | 3 | 1520-9210 |
Citations | PageRank | References |
5 | 0.38 | 0 |
Authors | ||
6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Lingling Zhang | 1 | 276 | 45.79 |
Minnan Luo | 2 | 269 | 21.18 |
Jun Liu | 3 | 178 | 25.96 |
Xiaojun Chang | 4 | 1585 | 76.85 |
Yi Yang | 5 | 6873 | 271.72 |
Alexander G. Hauptmann | 6 | 7472 | 558.23 |