Unsupervised Multimodal Video-To-Video Translation Via Self-Supervised Learning - Citegraph

Paper Info

Title
Unsupervised Multimodal Video-To-Video Translation Via Self-Supervised Learning

Abstract
Existing unsupervised video-to-video translation methods fail to produce translated videos which are frame-wise realistic, semantic information preserving and video-level consistent. In this work, we propose UVIT, a novel unsupervised video-to-video translation model. Our model decomposes the style and the content, uses the specialized encoder-decoder structure and propagates the inter-frame information through bidirectional recurrent neural network (RNN) units. The style-content decomposition mechanism enables us to achieve style consistent video translation results as well as provides us with a good interface for modality flexible translation. In addition, by changing the input frames and style codes incorporated in our translation, we propose a video interpolation loss, which captures temporal information within the sequence to train our building blocks in a self-supervised manner. Our model can produce photo-realistic, spatio-temporal consistent translated videos in a multimodal way. Subjective and objective experimental results validate the superiority of our model over existing methods.

Year	DOI	Venue
2021	10.1109/WACV48630.2021.00107	2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021)
DocType	ISSN	Citations
Conference	2472-6737	0
PageRank	References	Authors
0.34	0	4

Authors (4 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Liu Kangning	1	0	0.34
Shuhang Gu	2	701	28.25
Andrés Romero	3	9	3.33
Radu Timofte	4	1880	118.45

1