Title
Learning Cross-Modal Representations For Language-Based Image Manipulation
Abstract
In this paper, we propose a generative architecture for manipulating images/scenes with natural language descriptions. This is a challenging task as the generative network is expected to perform the given text instruction without changing the non-affiliating contents of the input image. Two main drawbacks of the existing methods are their limitation of performing changes that would affect only a limited region and the inability of handling complex instructions. The proposed approach, designed to address these limitations initially uses two sets of networks to extract the image and text features respectively. Rather than a simple combination of these two modalities during the image manipulation process, we use an improved technique to compose image and text features. Additionally, the generative network utilizes similarity learning to improve text manipulation which also enforces only the text-relevant changes on the input image. Our experiments on CSS and Fashion Synthesis datasets show that the proposed approach performs remarkably well and outperforms the baseline frameworks in terms of R-precision and FID.
Year
DOI
Venue
2020
10.1109/ICIP40778.2020.9191228
2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP)
Keywords
DocType
ISSN
Generative Adversarial Networks, Image Manipulation with Language, Image Editing
Conference
1522-4880
Citations 
PageRank 
References 
2
0.35
0
Authors
3
Name
Order
Citations
PageRank
Ak, Kenan E.1142.52
Ying Sun222419.86
Joo-Hwee Lim378382.45