Compositional Generalization in Image Captioning - Citegraph

Paper Info

Title
Compositional Generalization in Image Captioning

Abstract
Image captioning models are usually evaluated on their ability to describe a held-out set of images, not on their ability to generalize to unseen concepts. We study the problem of compositional generalization, which measures how well a model composes unseen combinations of concepts when describing images. State-of-the-art image captioning models show poor generalization performance on this task. We propose a multi-task model to address the poor performance, that combines caption generation and image--sentence ranking, and uses a decoding mechanism that re-ranks the captions according their similarity to the image. This model is substantially better at generalizing to unseen combinations of concepts compared to state-of-the-art captioning models.

Year	DOI	Venue
2019	10.18653/v1/k19-1009	2986729057
Field	DocType	Citations
Closed captioning,Computer science,Artificial intelligence,Natural language processing	Conference	0
PageRank	References	Authors
0.34	0	5

Authors (5 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Mitja Nikolaus	1	0	1.01
Mostafa Abdou	2	0	4.73
Matthew Lamm	3	26	4.82
Rahul Aralikatte	4	2	2.74
desmond elliott	5	309	24.91

1