Integrating Language Guidance into Vision-based Deep Metric Learning - Citegraph

Paper Info

Title
Integrating Language Guidance into Vision-based Deep Metric Learning

Abstract
Deep Metric Learning (DML) proposes to learn metric spaces which encode semantic similarities as embedding space distances. These spaces should be transferable to classes beyond those seen during training. Commonly, DML methods task networks to solve contrastive ranking tasks defined over binary class assignments. However, such approaches ignore higher-level semantic relations between the actual classes. This causes learned embedding spaces to encode incomplete semantic context and misrepresent the semantic relation between classes, impacting the generalizability of the learned metric space. To tackle this issue, we propose a language guidance objective for visual similarity learning. Leveraging language embeddings of expert- and pseudo-classnames, we contextualize and realign visual representation spaces corresponding to meaningful language semantics for better semantic consistency. Extensive experiments and ablations provide a strong motivation for our proposed approach and show language guidance offering significant, model-agnostic improvements for DML, achieving competitive and state-of-the-art results on all benchmarks. Code available at github.com/ExplainableML/LanguageGuidance-for_DML.

Year	DOI	Venue
2022	10.1109/CVPR52688.2022.01570	IEEE Conference on Computer Vision and Pattern Recognition
Keywords	DocType	Volume
Recognition: detection,categorization,retrieval, Representation learning, Transfer/low-shot/long-tail learning, Vision + language	Conference	2022
Issue	Citations	PageRank
1	0	0.34
References	Authors
0	3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Karsten Roth	1	5	4.17
Oriol Vinyals	2	9419	418.45
Zeynep Akata	3	835	42.24

1