Abstract | ||
---|---|---|
Text-to-image translation has become an attractive yet challenging task in computer vision. Previous approaches tend to generate similar, or even monotonous, images for distinctive texts and overlook the characteristics of specific sentences. In this paper, we aim to generate images from the given texts by preserving diverse appearances and modes of the objects or instances contained. To achieve that, a novel learning model named SuperGAN is proposed, which consists of two major components: an image synthesis network and a captioning model in a Cycle-GAN framework. SuperGAN adopts the cycle-consistent adversarial training strategy to learn an image generator where the feature distribution of the generated images complies with the distribution of the generic images. Meanwhile, a cycle-consistency loss is applied to constrain that the caption of the generated images is closed to the original texts. Extensive experiments on the benchmark dataset Oxford-flowers-102 demonstrate the validity and effectiveness of our proposed method. In addition, a new evaluation metric is proposed to measure the diversity of synthetic results. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1109/ICMEW.2019.00085 | 2019 IEEE International Conference on Multimedia & Expo Workshops (ICMEW) |
Keywords | Field | DocType |
Image synthesis,Image captioning,generative adversarial networks,cycle-consistency loss | Closed captioning,Pattern recognition,Computer science,Image synthesis,Natural language,Artificial intelligence,Adversarial system | Conference |
ISSN | ISBN | Citations |
2330-7927 | 978-1-5386-9215-8 | 2 |
PageRank | References | Authors |
0.35 | 1 | 2 |