Title
Exploring Pairwise Relationships Adaptively From Linguistic Context in Image Captioning
Abstract
For image captioning, recent works start to focus on exploring visual relationships for generating high-quality interactive words (i.e. verbs and prepositions). However, many existing works only focus on semantic level by analysing the feature similarity between objects in the visual domain but ignore the linguistic context included in the caption decoder. When captioning is being carried out, the entity words can be inferred based on visual information of objects. The interactive words representing the relationships between entity words can only be inferred based on high-level language meaning generated in the process of captioning decoding. Such high-level language meaning is called linguistic context, which refers to the relational context between words or phrases in the caption sentences. The linguistic context can be used as strong guidance to explore related visual relationships between different objects effectively. To achieve this, we propose a novel context-adaptive attention module that is strongly driven by the linguistic context from the caption decoder. In this module, a novel design of visual relationship attention is proposed based on a bilinear self-attention model to explore related visual relationships and encode more discriminative features under the linguistic context. To achieve the adaptive process of attending to related visual relationships for generating interactive words or related visual objects for entity words, an attention modulator is integrated as an attention channel controller responding to the changing linguistic context of the caption decoder dynamically. Experimented on MSCOCO dataset, our model achieves promising performances compared with all counterpart models that explore visual relationships.
Year
DOI
Venue
2022
10.1109/TMM.2021.3093725
IEEE TRANSACTIONS ON MULTIMEDIA
Keywords
DocType
Volume
Visualization, Linguistics, Decoding, Modulation, Context modeling, Adaptation models, Semantics, Bilinear attention, bilinear self-attention, context-adaptive attention, dynamic linguistic context, image captioning, visual relationship attention
Journal
24
ISSN
Citations 
PageRank 
1520-9210
0
0.34
References 
Authors
0
4
Name
Order
Citations
PageRank
Zongjian Zhang100.34
Qiang Wu230440.42
Yang Wang396.83
Fang Chen400.34