Learning to Describe Differences Between Pairs of Similar Images. - Citegraph

Paper Info

Title
Learning to Describe Differences Between Pairs of Similar Images.

Abstract
In this paper, we introduce the task of automatically generating text to describe the differences between two similar images. We collect a new dataset by crowd-sourcing difference descriptions for pairs of image frames extracted from video-surveillance footage. Annotators were asked to succinctly describe all the differences in a short paragraph. As a result, our novel dataset provides an opportunity to explore models that align language and vision, and capture visual salience. The dataset may also be a useful benchmark for coherent multi-sentence generation. We perform a firstpass visual analysis that exposes clusters of differing pixels as a proxy for object-level differences. We propose a model that captures visual salience by using a latent variable to align clusters of differing pixels with output sentences. We find that, for both single-sentence generation and as well as multi-sentence generation, the proposed model outperforms the models that use attention alone.

Year	DOI	Venue
2018	10.18653/v1/d18-1436	EMNLP
DocType	Volume	Citations
Conference	abs/1808.10584	0
PageRank	References	Authors
0.34	21	2

Authors (2 rows)

Cited by (0 rows)

References (21 rows)

Name	Order	Citations	PageRank
Harsh Jhamtani	1	19	6.51
Taylor Berg-Kirkpatrick	2	554	35.93

1