Title
Which Apple Keeps Which Doctor Away? Colorful Word Representations With Visual Oracles
Abstract
AbstractRecent pre-trained language models (PrLMs) offer a new performant method of contextualized word representations by leveraging the sequence-level context for modeling. Although the PrLMs generally provide more effective contextualized word representations than non-contextualized models, they are still subject to a sequence of text contexts without diverse hints from multimodality. This paper thus proposes a visual representation method to explicitly enhance conventional word embedding with multiple-aspect senses from visual guidance. In detail, we build a small-scale word-image dictionary from a multimodal seed dataset where each word corresponds to diverse related images. Experiments on 12 natural language understanding and machine translation tasks further verify the effectiveness and the generalization capability of the proposed approach. Analysis shows that our method with visual guidance pays more attention to content words, improves the representation diversity, and is potentially beneficial for enhancing the accuracy of disambiguation.
Year
DOI
Venue
2022
10.1109/TASLP.2021.3130972
IEEE/ACM Transactions on Audio, Speech and Language Processing
Keywords
DocType
Volume
Task analysis, Visualization, Dictionaries, Context modeling, Machine translation, Image representation, Speech processing, Multimodal learning, natural language processing, pre-trained models, vision-language modeling, word representations
Journal
10.5555
Issue
ISSN
Citations 
taslp.2022.issue-30
2329-9290
0
PageRank 
References 
Authors
0.34
16
4
Name
Order
Citations
PageRank
Zhuosheng Zhang15714.93
Haojie Yu200.34
Hai Zhao3960113.64
Masao Utiyama471486.69