Title
Text-to-Image Generation Grounded by Fine-Grained User Attention
Abstract
Localized Narratives [28] is a dataset with detailed natural language descriptions of images paired with mouse traces that provide a sparse, fine-grained visual grounding for phrases. We propose TRECS, a sequential model that exploits this grounding to generate images. TRECS uses descriptions to retrieve segmentation masks and predict object labels aligned with mouse traces. These alignments are u...
Year
DOI
Venue
2021
10.1109/WACV48630.2021.00028
2021 IEEE Winter Conference on Applications of Computer Vision (WACV)
Keywords
DocType
ISSN
Measurement,Image segmentation,Visualization,Computer vision,Grounding,Conferences,Natural languages
Conference
2472-6737
ISBN
Citations 
PageRank 
978-1-6654-0477-8
0
0.34
References 
Authors
5
4
Name
Order
Citations
PageRank
Jing Yu Koh100.34
Jason Baldridge293369.95
Honglak Lee36247398.39
Yinfei Yang400.34