Title
Learning Words By Drawing Images
Abstract
We propose a framework for learning through drawing. Our goal is to learn the correspondence between spoken words and abstract visual attributes, from a dataset of spoken descriptions of images. Building upon recent findings that GAN representations can be manipulated to edit semantic concepts in the generated output, we propose a new method to use such GAN-generated images to train a model using a triplet loss. To apply the method, we develop Audio CLEVRGAN, a new dataset of audio descriptions of GAN-generated CLEVR images, and we describe a training procedure that creates a curriculum of GAN-generated images that focuses training on image pairs that differ in a specific, informative way. Training is done without additional supervision beyond the spoken captions and the GAN. We find that training that takes advantage of GAN-generated edited examples results in improvements in the model's ability to learn attributes compared to previous results. Our proposed learning framework also results in models that can associate spoken words with some abstract visual concepts such as color and size.
Year
DOI
Venue
2019
10.1109/CVPR.2019.00213
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019)
Field
DocType
ISSN
Computer vision,Computer science,Human–computer interaction,Artificial intelligence
Conference
1063-6919
Citations 
PageRank 
References 
1
0.34
0
Authors
6
Name
Order
Citations
PageRank
Didac Suris110.68
Adrià Recasens2746.55
David Bau31499.18
David F. Harwath4638.34
James Glass53123413.63
Antonio Torralba614607956.27