Abstract | ||
---|---|---|
We present the vector quantized diffusion (VQ-Diffusion) model for text-to-image generation. This method is based on a vector quantized variational autoencoder (VQ-VAE) whose latent space is modeled by a conditional variant of the recently developed Denoising Diffusion Probabilistic Model (DDPM). We find that this latent-space method is well-suited for text-to-image generation tasks because it not only eliminates the unidirectional bias with existing methods but also allows us to incorporate a mask-and-replace diffusion strategy to avoid the accumulation of errors, which is a serious problem with existing methods. Our experiments show that the VQ-Diffusion produces significantly better text-to-image generation results when compared with conventional autoregressive (AR) models with similar numbers of parameters. Compared with previous GAN-based text-to-image methods, our VQ-Diffusion can handle more complex scenes and improve the synthesized image quality by a large margin. Finally, we show that the image generation computation in our method can be made highly efficient by reparameterization. With traditional AR methods, the text-to-image generation time increases linearly with the output image resolution and hence is quite time consuming even for normal size images. The VQ-Diffusion allows us to achieve a better trade-off between quality and speed. Our experiments indicate that the VQ-Diffusion model with the reparameterization is fifteen times faster than traditional AR methods while achieving a better image quality. The code and models are available at https://github.com/cientgu/VQ-Diffusion. |
Year | DOI | Venue |
---|---|---|
2022 | 10.1109/CVPR52688.2022.01043 | IEEE Conference on Computer Vision and Pattern Recognition |
Keywords | DocType | Volume |
Image and video synthesis and generation, Vision + language | Conference | 2022 |
Issue | Citations | PageRank |
1 | 0 | 0.34 |
References | Authors | |
0 | 8 |
Name | Order | Citations | PageRank |
---|---|---|---|
Shuyang Gu | 1 | 11 | 1.83 |
Dong Chen | 2 | 681 | 32.51 |
Jianmin Bao | 3 | 22 | 5.76 |
Fang Wen | 4 | 2077 | 86.88 |
Bo Zhang | 5 | 22 | 5.68 |
Dongdong Chen | 6 | 52 | 19.10 |
Lu Yuan | 7 | 801 | 48.29 |
Baining Guo | 8 | 3970 | 194.91 |