Abstract | ||
---|---|---|
In this work we propose a simple unsupervised approach for next frame prediction in video. Instead of directly predicting the pixels in a frame given past frames, we predict the transformations needed for generating the next frame in a sequence, given the transformations of the past frames. This leads to sharper results, while using a smaller prediction model.In order to enable a fair comparison between different video frame prediction models, we also propose a new evaluation protocol. We use generated frames as input to a classifier trained with ground truth sequences. This criterion guarantees that models scoring high are those producing sequences which preserve discrim- inative features, as opposed to merely penalizing any deviation, plausible or not, from the ground truth. Our proposed approach compares favourably against more sophisticated ones on the UCF-101 data set, while also being more efficient in terms of the number of parameters and computational cost. |
Year | Venue | Field |
---|---|---|
2017 | arXiv: Learning | Computer science,Ground truth,Unsupervised learning,Pixel,Artificial intelligence,Predictive modelling,Classifier (linguistics),Machine learning |
DocType | Volume | Citations |
Journal | abs/1701.08435 | 10 |
PageRank | References | Authors |
0.53 | 8 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Joost R. van Amersfoort | 1 | 35 | 1.95 |
Anitha Kannan | 2 | 570 | 46.43 |
Marc'Aurelio Ranzato | 3 | 5242 | 470.94 |
Arthur Szlam | 4 | 103 | 5.05 |
Du Tran | 5 | 1289 | 38.35 |
Soumith Chintala | 6 | 2056 | 102.09 |