Title
CapVis: Toward Better Understanding of Visual-Verbal Saliency Consistency.
Abstract
When looking at an image, humans shift their attention toward interesting regions, making sequences of eye fixations. When describing an image, they also come up with simple sentences that highlight the key elements in the scene. What is the correlation between where people look and what they describe in an image? To investigate this problem intuitively, we develop a visual analytics system, CapVis, to look into visual attention and image captioning, two types of subjective annotations that are relatively task-free and natural. Using these annotations, we propose a word-weighting scheme to extract visual and verbal saliency ranks to compare against each other. In our approach, a number of low-level and semantic-level features relevant to visual-verbal saliency consistency are proposed and visualized for a better understanding of image content. Our method also shows the different ways that a human and a computational model look at and describe images, which provides reliable information for a captioning model. Experiment also shows that the visualized feature can be integrated into a computational model to effectively predict the consistency between the two modalities on an image dataset with both types of annotations.
Year
DOI
Venue
2019
10.1145/3200767
ACM TIST
Keywords
Field
DocType
Image captioning, visual analytics, visual saliency
Modalities,Closed captioning,Fixation (psychology),Salience (neuroscience),Computer science,Visual analytics,Image content,Visual attention,Correlation,Natural language processing,Artificial intelligence,Machine learning
Journal
Volume
Issue
ISSN
10
1
2157-6904
Citations 
PageRank 
References 
0
0.34
30
Authors
4
Name
Order
Citations
PageRank
Haoran Liang162.09
Ming Jiang24712.22
Ronghua Liang337642.60
Qi Zhao468344.60