Title
Drill-down: Interactive Retrieval of Complex Scenes using Natural Language Queries
Abstract
This paper explores the task of interactive image retrieval using natural language queries, where a user progressively provides input queries to refine a set of retrieval results. Moreover, our work explores this problem in the context of complex image scenes containing multiple objects. We propose Drill-down, an effective framework for encoding multiple queries with an efficient compact state representation that significantly extends current methods for single-round image retrieval. We show that using multiple rounds of natural language queries as input can be surprisingly effective to find arbitrarily specific images of complex scenes. Furthermore, we find that existing image datasets with textual captions can provide a surprisingly effective form of weak supervision for this task. We compare our method with existing sequential encoding and embedding networks, demonstrating superior performance on two proposed benchmarks: automatic image retrieval on a simulated scenario that uses region captions as queries, and interactive image retrieval using real queries from human evaluators.
Year
Venue
Keywords
2019
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019)
interactive image retrieval
Field
DocType
Volume
Computer science,Drill down,Natural language,Natural language processing,Artificial intelligence,Machine learning
Conference
32
ISSN
Citations 
PageRank 
1049-5258
0
0.34
References 
Authors
0
6
Name
Order
Citations
PageRank
Fuwen Tan182.47
Cascante-Bonilla, Paola200.34
Guo, Xiaoxiao333523.26
Hui Wu4384.65
Song Feng528019.55
Vicente Ordonez6141869.65