Title
Transformer models for enhancing AttnGAN based text to image generation
Abstract
Deep neural networks are capable of producing photographic images that depict given natural language text descriptions. Such models have huge potential in applications such as interior designing, video games, editing and facial sketching for digital forensics. However, only a limited number of methods in the literature have been developed for text to image (TTI) generation. Most of them use Generative Adversarial Networks (GAN) based deep learning methods. Attentional GAN (AttnGAN) is a popular GAN based TTI method that extracts meaningful information from the given text descriptions using attention mechanism. In this paper, we investigate the use of different Transformer models such as BERT, GPT2, XLNet with AttnGAN to solve the challenge of extracting semantic information from the text descriptions. Hence, the proposed AttnGAN(TRANS) architecture has three variants AttnGAN(BERT), AttnGAN(XL) and AttnGAN(GPT). The proposed method is successful over the conventional AttnGAN and gives a boosted inception score by 27.23% and a decline of Frechet inception distance by 49.9%. The results in our experiments indicate that the proposed method has the potential to outperform the contemporary state-of-the-art methods and validate the use of Transformer models in improving the performance of TTI generation. The code is made publicly available at https://github.com/sairamkiran9/AttnGAN-trans. (C) 2021 Elsevier B.V. All rights reserved.
Year
DOI
Venue
2021
10.1016/j.imavis.2021.104284
Image and Vision Computing
Keywords
DocType
Volume
Generative Adversarial Networks (GANs),Natural Language Processing (NLP),Text to image synthesis,Transformers,Attention mechanism
Journal
115
ISSN
Citations 
PageRank 
0262-8856
0
0.34
References 
Authors
0
5
Name
Order
Citations
PageRank
S. Naveen100.34
M. S. S. Ram Kiran200.34
M. Indupriya300.34
T. V. Manikanta400.34
P. V. Sudeep5273.44