Abstract | ||
---|---|---|
This paper presents a new metric called TIGEr for the automatic evaluation of image captioning systems. Popular metrics, such as BLEU and CIDEr, are based solely on text matching between reference captions and machine-generated captions, potentially leading to biased evaluations because references may not fully cover the image content and natural language is inherently ambiguous. Building upon a machine-learned text-image grounding model, TIGEr allows to evaluate caption quality not only based on how well a caption represents image content, but also on how well machine-generated captions match human-generated captions. Our empirical tests show that TIGEr has a higher consistency with human judgments than alternative existing metrics. We also comprehensively assess the metric's effectiveness in caption evaluation by measuring the correlation between human judgments and metric scores. |
Year | DOI | Venue |
---|---|---|
2019 | 10.18653/v1/D19-1220 | EMNLP/IJCNLP (1) |
DocType | Volume | Citations |
Conference | D19-1 | 1 |
PageRank | References | Authors |
0.34 | 0 | 8 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ming Jiang | 1 | 1 | 0.68 |
Qiuyuan Huang | 2 | 176 | 17.66 |
Lei Zhang | 3 | 1 | 0.34 |
Xin Wang | 4 | 1 | 0.34 |
Pengchuan Zhang | 5 | 31 | 8.17 |
Zhe Gan | 6 | 319 | 32.58 |
Jana Diesner | 7 | 216 | 24.38 |
Jianfeng Gao | 8 | 5729 | 296.43 |