CKD: Cross-task Knowledge Distillation for Text-to-image Synthesis - Citegraph

Paper Info

Title
CKD: Cross-task Knowledge Distillation for Text-to-image Synthesis

Abstract
Text-to-image synthesis (T2IS) has drawn increasing interest recently, which can automatically generate images conditioned on text descriptions. It is a highly challenging task that learns a mapping from a semantic space of text description to a complex RGB pixel space of image. The main issues of T2IS lie in two aspects: <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">semantic consistency</italic> and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">visual quality</italic> . The distributions between text descriptions and image contents are inconsistent since they belong to different modalities. So it is ambitious to generate images containing consistent semantic contents with the text descriptions, which is the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">semantic consistency</italic> issue. Moreover, due to the discrepancy of data distributions between real and synthetic images in huge pixel space, it is hard to approximate the real data distribution for synthesizing photo-realistic images, which is the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">visual quality</italic> issue. For addressing the above issues, we propose a cross-task knowledge distillation (CKD) approach to transfer knowledge from multiple image semantic understanding tasks into T2IS task. There is amount of knowledge in image semantic understanding tasks to translate image contents into semantic representation, which is advantageous to address the issues of <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">semantic consistency</italic> and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">visual quality</italic> for T2IS. Moreover, we design a multi-stage knowledge distillation paradigm to decompose the distillation process into multiple stages. By this paradigm, it is effective to approximate the distributions of real image and understand textual information for T2IS, which can improve the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">visual quality</italic> and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">semantic consistency</italic> of synthetic images. Comprehensive experiments on widely-used datasets show the effectiveness of our proposed CKD approach.

Year	DOI	Venue
2020	10.1109/TMM.2019.2951463	IEEE Transactions on Multimedia
Keywords	DocType	Volume
Semantics,Visualization,Task analysis,Image synthesis,Generative adversarial networks,Neural networks,Image color analysis	Journal	22
Issue	ISSN	Citations
8	1520-9210	7
PageRank	References	Authors
0.45	0	2

Authors (2 rows)

Cited by (7 rows)

References (0 rows)

Name	Order	Citations	PageRank
Mingkuan Yuan	1	71	3.75
Yuxin Peng	2	1122	74.90

1