Title
Compositional Generalization in Image Captioning
Abstract
Image captioning models are usually evaluated on their ability to describe a held-out set of images, not on their ability to generalize to unseen concepts. We study the problem of compositional generalization, which measures how well a model composes unseen combinations of concepts when describing images. State-of-the-art image captioning models show poor generalization performance on this task. We propose a multi-task model to address the poor performance, that combines caption generation and image--sentence ranking, and uses a decoding mechanism that re-ranks the captions according their similarity to the image. This model is substantially better at generalizing to unseen combinations of concepts compared to state-of-the-art captioning models.
Year
DOI
Venue
2019
10.18653/v1/k19-1009
2986729057
Field
DocType
Citations 
Closed captioning,Computer science,Artificial intelligence,Natural language processing
Conference
0
PageRank 
References 
Authors
0.34
0
5
Name
Order
Citations
PageRank
Mitja Nikolaus101.01
Mostafa Abdou204.73
Matthew Lamm3264.82
Rahul Aralikatte422.74
desmond elliott530924.91