Abstract | ||
---|---|---|
Image captioning is gaining significance in multiple applications such as content-based visual search and chat-bots. Much of the recent progress in this field embraces a data-driven approach without deep consideration of human behavioural characteristics. In this paper, we focus on human-centered automatic image captioning. Our study is based on the intuition that different people will generate a variety of image captions for the same scene, as their knowledge and opinion about the scene may differ. In particular, we first perform a series of human studies to investigate what influences human description of a visual scene. We identify three main factors: a person's knowledge level of the scene, opinion on the scene, and gender. Based on our human study findings, we propose a novel human-centered algorithm that is able to generate human-like image captions. We evaluate the proposed model through traditional evaluation metrics, diversity metrics, and human-based evaluation. Experimental results demonstrate the superiority of our proposed model on generating diverse human-like image captions.
|
Year | DOI | Venue |
---|---|---|
2020 | 10.1145/3394171.3413589 | MM '20: The 28th ACM International Conference on Multimedia
Seattle
WA
USA
October, 2020 |
DocType | ISBN | Citations |
Conference | 978-1-4503-7988-5 | 0 |
PageRank | References | Authors |
0.34 | 0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Shuang Wu | 1 | 0 | 0.34 |
Shaojing Fan | 2 | 22 | 5.63 |
Zhiqi Shen | 3 | 1148 | 82.57 |
Mohan Kankanhalli | 4 | 3825 | 299.56 |
Anthony K. H. Tung | 5 | 3263 | 189.90 |