Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model - Citegraph

Paper Info

Title
Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model

Abstract
To achieve disentangled image manipulation, previous works depend heavily on manual annotation. Meanwhile, the available manipulations are limited to a pre-defined set the models were trainedfor. We propose a novelframework, i.e., Predict, Prevent, and Evaluate (PPE), for disentangled text-driven image manipulation that requires little manual annotation while being applicable to a wide variety of ma-nipulations. Our method approaches the targets by deeply exploiting the power of the large-scale pre-trained vision-language model CLIP [32]. Concretely, we firstly Predict the possibly entangled attributes for a given text command. Then, based on the predicted attributes, we introduce an entanglement loss to Prevent entanglements during training. Finally, we propose a new evaluation metric to Evaluate the disentangled image manipulation. We verify the effectiveness of our method on the challenging face editing task. Extensive experiments show that the proposed PPE frame-work achieves much better quantitative and qualitative re-sults than the up-to-date StyleCLIP [31] baseline. Code is available at https://github.com/zipengxuc/PPE.

Year	DOI	Venue
2022	10.1109/CVPR52688.2022.01769	IEEE Conference on Computer Vision and Pattern Recognition
Keywords	DocType	Volume
Image and video synthesis and generation, Face and gestures	Conference	2022
Issue	Citations	PageRank
1	0	0.34
References	Authors
0	9

Authors (9 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Zipeng Xu	1	0	0.34
Tianwei Lin	2	54	6.67
Hao Tang	3	338	34.83
Fu Li	4	3	2.42
He, D.	5	33	13.67
Nicu Sebe	6	7013	403.03
Radu Timofte	7	1880	118.45
Luc Van Gool	8	0	0.34
Er-rui Ding	9	142	29.31

1