Abstract | ||
---|---|---|
ABSTRACTText-to-face (T2F) generation is an emerging research hot spot in multimedia, which aims to synthesize vivid portraits based on the given descriptions. Its main challenge lies in the accurate alignments from texts to image pixels, which pose a high demand in generation fidelity. We define T2F as a pixel synthesis problem conditioned on the texts and propose a novel dynamic pixel synthesis network, PixelFace, for end-to-end T2F generation in this paper. To fully exploit the prior knowledge for T2F synthesis, we propose a novel dynamic parameter generation module, which transforms text features into dynamic knowledge embeddings for end-to-end pixel regression. These knowledge embeddings are example-dependent and spatially related to image pixels, based on which PixelFace can exploit the text priors for high-quality text-guided face generation. To validate the proposed PixelFace, we conduct extensive experiments on the MMCelebA, and compare PixelFace with a set of state-of-the-art methods in T2F and T2I generations, e.g., StyleCLIP and TediGAN. The experimental results not only show the greater performance of PixelFace than the compared methods but also validates its merits over existing T2F methods in both text-image matching and inference speed. Codes will be released at: \textcolormagenta \urlhttps://github.com/pengjunn/PixelFace . |
Year | DOI | Venue |
---|---|---|
2022 | 10.1145/3503161.3547818 | International Multimedia Conference |
DocType | Citations | PageRank |
Conference | 0 | 0.34 |
References | Authors | |
0 | 7 |
Name | Order | Citations | PageRank |
---|---|---|---|
Jun Peng | 1 | 0 | 0.34 |
Xiaoxiong Du | 2 | 0 | 0.34 |
Yiyi Zhou | 3 | 0 | 0.34 |
Jing He | 4 | 0 | 0.34 |
Yunhang Shen | 5 | 29 | 7.25 |
Xiaoshuai Sun | 6 | 623 | 58.76 |
Rongrong Ji | 7 | 3616 | 189.98 |