Title | ||
---|---|---|
Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model |
Abstract | ||
---|---|---|
To achieve disentangled image manipulation, previous works depend heavily on manual annotation. Meanwhile, the available manipulations are limited to a pre-defined set the models were trainedfor. We propose a novelframework, i.e., Predict, Prevent, and Evaluate (PPE), for disentangled text-driven image manipulation that requires little manual annotation while being applicable to a wide variety of ma-nipulations. Our method approaches the targets by deeply exploiting the power of the large-scale pre-trained vision-language model CLIP [32]. Concretely, we firstly Predict the possibly entangled attributes for a given text command. Then, based on the predicted attributes, we introduce an entanglement loss to Prevent entanglements during training. Finally, we propose a new evaluation metric to Evaluate the disentangled image manipulation. We verify the effectiveness of our method on the challenging face editing task. Extensive experiments show that the proposed PPE frame-work achieves much better quantitative and qualitative re-sults than the up-to-date StyleCLIP [31] baseline. Code is available at https://github.com/zipengxuc/PPE. |
Year | DOI | Venue |
---|---|---|
2022 | 10.1109/CVPR52688.2022.01769 | IEEE Conference on Computer Vision and Pattern Recognition |
Keywords | DocType | Volume |
Image and video synthesis and generation, Face and gestures | Conference | 2022 |
Issue | Citations | PageRank |
1 | 0 | 0.34 |
References | Authors | |
0 | 9 |
Name | Order | Citations | PageRank |
---|---|---|---|
Zipeng Xu | 1 | 0 | 0.34 |
Tianwei Lin | 2 | 54 | 6.67 |
Hao Tang | 3 | 338 | 34.83 |
Fu Li | 4 | 3 | 2.42 |
He, D. | 5 | 33 | 13.67 |
Nicu Sebe | 6 | 7013 | 403.03 |
Radu Timofte | 7 | 1880 | 118.45 |
Luc Van Gool | 8 | 0 | 0.34 |
Er-rui Ding | 9 | 142 | 29.31 |