Title
Combining Vision and Language Representations for Patch-based Identification of Lexico-Semantic Relations
Abstract
ABSTRACTAlthough a wide range of applications have been proposed in the field of multimodal natural language processing, very few works have been tackling multimodal relational lexical semantics. In this paper, we propose the first attempt to identify lexico-semantic relations with visual clues, which embody linguistic phenomena such as synonymy, co-hyponymy or hypernymy. While traditional methods take advantage of the paradigmatic approach or/and the distributional hypothesis, we hypothesize that visual information can supplement the textual information, relying on the apperceptum subcomponent of the semiotic textology linguistic theory. For that purpose, we automatically extend two gold-standard datasets with visual information, and develop different fusion techniques to combine textual and visual modalities following the patch-based strategy. Experimental results over the multimodal datasets show that the visual information can supplement the missing semantics of textual encodings with reliable performance improvements.
Year
DOI
Venue
2022
10.1145/3503161.3548299
International Multimedia Conference
DocType
Citations 
PageRank 
Conference
0
0.34
References 
Authors
0
7
Name
Order
Citations
PageRank
Prince Jha100.34
Gaël Dias235441.95
Alexis Lechervy300.34
José G. Moreno400.34
Anubhav Jangra500.34
Sebastião Pais600.34
Sriparna Saha71064106.07