Language Models are Few-Shot Learners

Paper Info

Title
Language Models are Few-Shot Learners

Abstract
Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.

Year	Venue	DocType
2020	NIPS 2020	Conference
Volume	Citations	PageRank
33	2	0.36
References	Authors
0	31

Authors (31 rows)

Cited by (2 rows)

References (0 rows)

Name	Order	Citations	PageRank
Tom B. Brown	1	63	3.61
Mann Benjamin	2	2	0.36
Nick Ryder	3	2	2.05
Subbiah Melanie	4	2	0.36
Jared Kaplan	5	21	1.61
John Schulman	6	1806	66.95
Arvind Neelakantan	7	408	17.77
Pranav Shyam	8	3	1.39
Girish Sastry	9	6	0.77
Askell Amanda	10	2	0.36
Agarwal Sandhini	11	2	0.36
Ariel Herbert-Voss	12	3	1.06
Krueger Gretchen	13	2	0.36
Henighan Tom	14	2	0.36
Rewon Child	15	38	3.79
Ramesh Aditya	16	2	0.36
Daniel Ziegler	17	55	2.99
Wu Jeffrey	18	2	0.36
Winter Clemens	19	2	0.36
Christopher Hesse	20	6	1.09
Chen Mark	21	2	0.36
Sigler Eric	22	2	0.36
Litwin Mateusz	23	2	0.36
Gray Scott	24	2	0.36
Chess Benjamin	25	2	0.36
Jack Clark	26	6	0.80
Christopher Berner	27	4	0.74
Sam McCandlish	28	9	1.16
alec radford	29	2165	75.60
Ilya Sutskever	30	25814	1120.24
Dario Amodei	31	455	17.92