CapVis: Toward Better Understanding of Visual-Verbal Saliency Consistency. - Citegraph

Paper Info

Title
CapVis: Toward Better Understanding of Visual-Verbal Saliency Consistency.

Abstract
When looking at an image, humans shift their attention toward interesting regions, making sequences of eye fixations. When describing an image, they also come up with simple sentences that highlight the key elements in the scene. What is the correlation between where people look and what they describe in an image? To investigate this problem intuitively, we develop a visual analytics system, CapVis, to look into visual attention and image captioning, two types of subjective annotations that are relatively task-free and natural. Using these annotations, we propose a word-weighting scheme to extract visual and verbal saliency ranks to compare against each other. In our approach, a number of low-level and semantic-level features relevant to visual-verbal saliency consistency are proposed and visualized for a better understanding of image content. Our method also shows the different ways that a human and a computational model look at and describe images, which provides reliable information for a captioning model. Experiment also shows that the visualized feature can be integrated into a computational model to effectively predict the consistency between the two modalities on an image dataset with both types of annotations.

Year	DOI	Venue
2019	10.1145/3200767	ACM TIST
Keywords	Field	DocType
Image captioning, visual analytics, visual saliency	Modalities,Closed captioning,Fixation (psychology),Salience (neuroscience),Computer science,Visual analytics,Image content,Visual attention,Correlation,Natural language processing,Artificial intelligence,Machine learning	Journal
Volume	Issue	ISSN
10	1	2157-6904
Citations	PageRank	References
0	0.34	30
Authors
4

Authors (4 rows)

Cited by (0 rows)

References (30 rows)

Name	Order	Citations	PageRank
Haoran Liang	1	6	2.09
Ming Jiang	2	47	12.22
Ronghua Liang	3	376	42.60
Qi Zhao	4	683	44.60

1