Using Syntax to Ground Referring Expressions in Natural Images. - Citegraph

Paper Info

Title
Using Syntax to Ground Referring Expressions in Natural Images.

Abstract
We introduce GroundNet, a neural network for referring expression recognition - the task of localizing (or grounding) in an image the object referred to by a natural language expression. Our approach to this task is the first to rely on a syntactic analysis of the input referring expression in order to inform the structure of the computation graph. Given a parse tree for an input expression, we explicitly map the syntactic constituents and relationships present in the tree to a composed graph of neural modules that defines our architecture for performing localization. This syntax-based approach aids localization of both the target object and auxiliary supporting objects mentioned in the expression. As a result, GroundNet is more interpretable than previous methods: we can (1) determine which phrase of the referring expression points to which object in the image and (2) track how the localization of the target object is determined by the network. We study this property empirically by introducing a new set of annotations on the GoogleRef dataset to evaluate localization of supporting objects. Our experiments show that GroundNet achieves state-of-the-art accuracy in identifying supporting objects, while maintaining comparable performance in the localization of target objects.

Year	Venue	DocType
2018	THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE	Journal
Volume	Citations	PageRank
abs/1805.10547	9	0.46
References	Authors
16	3

Authors (3 rows)

Cited by (9 rows)

References (16 rows)

Name	Order	Citations	PageRank
Volkan Cirik	1	81	5.41
Taylor Berg-Kirkpatrick	2	554	35.93
Louis-Philippe Morency	3	3220	200.79

1