Combining Vision and Language Representations for Patch-based Identification of Lexico-Semantic Relations - Citegraph

Paper Info

Title
Combining Vision and Language Representations for Patch-based Identification of Lexico-Semantic Relations

Abstract
ABSTRACTAlthough a wide range of applications have been proposed in the field of multimodal natural language processing, very few works have been tackling multimodal relational lexical semantics. In this paper, we propose the first attempt to identify lexico-semantic relations with visual clues, which embody linguistic phenomena such as synonymy, co-hyponymy or hypernymy. While traditional methods take advantage of the paradigmatic approach or/and the distributional hypothesis, we hypothesize that visual information can supplement the textual information, relying on the apperceptum subcomponent of the semiotic textology linguistic theory. For that purpose, we automatically extend two gold-standard datasets with visual information, and develop different fusion techniques to combine textual and visual modalities following the patch-based strategy. Experimental results over the multimodal datasets show that the visual information can supplement the missing semantics of textual encodings with reliable performance improvements.

Year	DOI	Venue
2022	10.1145/3503161.3548299	International Multimedia Conference
DocType	Citations	PageRank
Conference	0	0.34
References	Authors
0	7

Authors (7 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Prince Jha	1	0	0.34
Gaël Dias	2	354	41.95
Alexis Lechervy	3	0	0.34
José G. Moreno	4	0	0.34
Anubhav Jangra	5	0	0.34
Sebastião Pais	6	0	0.34
Sriparna Saha	7	1064	106.07

1