Learning Cross-Modal Representations For Language-Based Image Manipulation - Citegraph

Paper Info

Title
Learning Cross-Modal Representations For Language-Based Image Manipulation

Abstract
In this paper, we propose a generative architecture for manipulating images/scenes with natural language descriptions. This is a challenging task as the generative network is expected to perform the given text instruction without changing the non-affiliating contents of the input image. Two main drawbacks of the existing methods are their limitation of performing changes that would affect only a limited region and the inability of handling complex instructions. The proposed approach, designed to address these limitations initially uses two sets of networks to extract the image and text features respectively. Rather than a simple combination of these two modalities during the image manipulation process, we use an improved technique to compose image and text features. Additionally, the generative network utilizes similarity learning to improve text manipulation which also enforces only the text-relevant changes on the input image. Our experiments on CSS and Fashion Synthesis datasets show that the proposed approach performs remarkably well and outperforms the baseline frameworks in terms of R-precision and FID.

Year	DOI	Venue
2020	10.1109/ICIP40778.2020.9191228	2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP)
Keywords	DocType	ISSN
Generative Adversarial Networks, Image Manipulation with Language, Image Editing	Conference	1522-4880
Citations	PageRank	References
2	0.35	0
Authors
3

Authors (3 rows)

Cited by (2 rows)

References (0 rows)

Name	Order	Citations	PageRank
Ak, Kenan E.	1	14	2.52
Ying Sun	2	224	19.86
Joo-Hwee Lim	3	783	82.45

1