Title
Unsupervised Multimodal Video-To-Video Translation Via Self-Supervised Learning
Abstract
Existing unsupervised video-to-video translation methods fail to produce translated videos which are frame-wise realistic, semantic information preserving and video-level consistent. In this work, we propose UVIT, a novel unsupervised video-to-video translation model. Our model decomposes the style and the content, uses the specialized encoder-decoder structure and propagates the inter-frame information through bidirectional recurrent neural network (RNN) units. The style-content decomposition mechanism enables us to achieve style consistent video translation results as well as provides us with a good interface for modality flexible translation. In addition, by changing the input frames and style codes incorporated in our translation, we propose a video interpolation loss, which captures temporal information within the sequence to train our building blocks in a self-supervised manner. Our model can produce photo-realistic, spatio-temporal consistent translated videos in a multimodal way. Subjective and objective experimental results validate the superiority of our model over existing methods.
Year
DOI
Venue
2021
10.1109/WACV48630.2021.00107
2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021)
DocType
ISSN
Citations 
Conference
2472-6737
0
PageRank 
References 
Authors
0.34
0
4
Name
Order
Citations
PageRank
Liu Kangning100.34
Shuhang Gu270128.25
Andrés Romero393.33
Radu Timofte41880118.45