Generative Pretraining From Pixels - Citegraph

Paper Info

Title
Generative Pretraining From Pixels

Abstract
Inspired by progress in unsupervised representation learning for natural language, we examine whether similar models can learn useful representations for images. We train a sequence Transformer to auto-regressively predict pixels, without incorporating knowledge of the 2D input structure. Despite training on low-resolution ImageNet without labels, we find that a GPT-2 scale model learns strong image representations as measured by linear probing, fine-tuning, and low-data classification. On CIFAR-10, we achieve 96.3% accuracy with a linear probe, outperforming a supervised Wide ResNet, and 99.0% accuracy with full fine-tuning, matching the top supervised pre-trained models. We are also competitive with self-supervised benchmarks on ImageNet when substituting pixels for a VQVAE encoding, achieving 69.0% top-1 accuracy on a linear probe of our features.

Year	Venue	DocType
2020	ICML	Conference
Citations	PageRank	References
0	0.34	0
Authors
7

Authors (7 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Mark Chen	1	0	1.35
alec radford	2	2165	75.60
Rewon Child	3	38	3.79
Jeffrey K Wu	4	0	0.68
Heewoo Jun	5	11	1.53
David Luan	6	0	0.34
Ilya Sutskever	7	25814	1120.24

1