Abstract | ||
---|---|---|
Despite recent impressive results of generative adversarial networks on text-to-image generation, the generation of complex scenes with multiple objects in the complicated background remains challenging; moreover, end-to-end text-to-image generation still suffers from poor image quality. In this work, we propose a sequential algorithm of text-to-image generation, which allows synthesizing high-quality images (more than 1024x1024 pixels). The proposed approach consists of location inference, key objects extraction, image search, layout generation, and image harmonization stages. We compare the suggested approach with state-of-the-art image generation model DALL-E with text-to-image mapping. Our approach demonstrates the effectiveness and visual plausibility of the generated images based on golden section layouts. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1117/12.2622734 | FOURTEENTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2021) |
Keywords | DocType | Volume |
text-to-image generation, transformer, layout generation | Conference | 12084 |
ISSN | Citations | PageRank |
0277-786X | 0 | 0.34 |
References | Authors | |
0 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Valeria Efimova | 1 | 0 | 1.01 |
Andrey Filchenkov | 2 | 0 | 0.68 |