Title
CKD: Cross-task Knowledge Distillation for Text-to-image Synthesis
Abstract
Text-to-image synthesis (T2IS) has drawn increasing interest recently, which can automatically generate images conditioned on text descriptions. It is a highly challenging task that learns a mapping from a semantic space of text description to a complex RGB pixel space of image. The main issues of T2IS lie in two aspects: <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">semantic consistency</italic> and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">visual quality</italic> . The distributions between text descriptions and image contents are inconsistent since they belong to different modalities. So it is ambitious to generate images containing consistent semantic contents with the text descriptions, which is the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">semantic consistency</italic> issue. Moreover, due to the discrepancy of data distributions between real and synthetic images in huge pixel space, it is hard to approximate the real data distribution for synthesizing photo-realistic images, which is the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">visual quality</italic> issue. For addressing the above issues, we propose a cross-task knowledge distillation (CKD) approach to transfer knowledge from multiple image semantic understanding tasks into T2IS task. There is amount of knowledge in image semantic understanding tasks to translate image contents into semantic representation, which is advantageous to address the issues of <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">semantic consistency</italic> and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">visual quality</italic> for T2IS. Moreover, we design a multi-stage knowledge distillation paradigm to decompose the distillation process into multiple stages. By this paradigm, it is effective to approximate the distributions of real image and understand textual information for T2IS, which can improve the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">visual quality</italic> and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">semantic consistency</italic> of synthetic images. Comprehensive experiments on widely-used datasets show the effectiveness of our proposed CKD approach.
Year
DOI
Venue
2020
10.1109/TMM.2019.2951463
IEEE Transactions on Multimedia
Keywords
DocType
Volume
Semantics,Visualization,Task analysis,Image synthesis,Generative adversarial networks,Neural networks,Image color analysis
Journal
22
Issue
ISSN
Citations 
8
1520-9210
7
PageRank 
References 
Authors
0.45
0
2
Name
Order
Citations
PageRank
Mingkuan Yuan1713.75
Yuxin Peng2112274.90