VSE++: Improving Visual-Semantic Embeddings with Hard Negatives. - Citegraph

Paper Info

Title
VSE++: Improving Visual-Semantic Embeddings with Hard Negatives.

Abstract
We present a new technique for learning visual-semantic embeddings for cross-modal retrieval. Inspired by the use of hard negatives in structured prediction, and ranking loss functions used in retrieval, we introduce a simple change to common loss functions used to learn multi-modal embeddings. That, combined with fine-tuning and the use of augmented data, yields significant gains in retrieval performance. We showcase our approach, dubbed VSE++, on the MS-COCO and Flickr30K datasets, using ablation studies and comparisons with existing methods. On MS-COCO our approach outperforms state-of-the-art methods by 8.8% in caption retrieval, and 11.3% in image retrieval (based on R@1).

Year	Venue	Field
2018	british machine vision conference	Ranking,Structured prediction,Image retrieval,Artificial intelligence,Mathematics,Machine learning
DocType	Citations	PageRank
Conference	21	0.64
References	Authors
15	4

Authors (4 rows)

Cited by (21 rows)

References (15 rows)

Name	Order	Citations	PageRank
Fartash Faghri	1	61	3.88
David J. Fleet	2	21	1.65
Jamie Ryan Kiros	3	21	0.64
Sanja Fidler	4	183	10.30

1