Abstract | ||
---|---|---|
This paper introduces the unsupervised learning problem of playable video generation (PVG). In PVG, we aim at allowing a user to control the generated video by selecting a discrete action at every time step as when playing a video game. The difficulty of the task lies both in learning semantically consistent actions and in generating realistic videos conditioned on the user input. We propose a novel framework for PVG that is trained in a self-supervised manner on a large dataset of unlabelled videos. We employ an encoder-decoder architecture where the predicted action labels act as bottleneck. The network is constrained to learn a rich action space using, as main driving loss, a reconstruction loss on the generated video. We demonstrate the effectiveness of the proposed approach on several datasets with wide environment variety. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1109/CVPR46437.2021.00993 | 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 |
DocType | ISSN | Citations |
Conference | 1063-6919 | 0 |
PageRank | References | Authors |
0.34 | 0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Willi Menapace | 1 | 0 | 0.68 |
Stéphane Lathuilière | 2 | 0 | 0.68 |
Sergey Tulyakov | 3 | 28 | 9.28 |
Aliaksandr Siarohin | 4 | 7 | 2.83 |
Elisa Ricci 0002 | 5 | 1393 | 73.75 |