Title | ||
---|---|---|
Mutual Information Based Method For Unsupervised Disentanglement Of Video Representation |
Abstract | ||
---|---|---|
Video Prediction is a challenging but interesting task of predicting future frames from a given set context frames that belong to a video sequence. Video prediction models have prospective applications in maneuver planning, healthcare, autonomous navigation and simulation. One of the major challenges in future frame generation is the high dimensional nature of visual data. To handle this, we propose a Mutual Information Predictive Auto-Encoder (MIPAE) framework that reduces the task of predicting high dimensional video frames by factorising video representations into content and low dimensional pose latent variables. Our approach leverages the temporal structure in the latent generative factors of video sequences by applying a novel mutual information loss to learn disentangled video representations. A standard LSTM network is used to predict these low dimensional pose representations. Content and the predicted pose representations are decoded to generate future frames. We also propose a metric based on mutual information gap (MIG) to quantitatively access the effectiveness of disentanglement on DSprites and MPI3D-real datasets. MIG scores corroborate the visual superiority of frames predicted by MIPAE. We also compare our method quantitatively on LPIPS, SSIM and PSNR evaluation metrics. |
Year | DOI | Venue |
---|---|---|
2020 | 10.1109/ICPR48806.2021.9412679 | 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR) |
DocType | ISSN | Citations |
Conference | 1051-4651 | 1 |
PageRank | References | Authors |
0.36 | 0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
P Aditya Sreekar | 1 | 1 | 0.70 |
Ujjwal Tiwari | 2 | 1 | 1.04 |
Anoop M. Namboodiri | 3 | 255 | 26.36 |