Abstract | ||
---|---|---|
Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general. |
Year | Venue | DocType |
---|---|---|
2020 | NIPS 2020 | Conference |
Volume | Citations | PageRank |
33 | 2 | 0.36 |
References | Authors | |
0 | 31 |
Name | Order | Citations | PageRank |
---|---|---|---|
Tom B. Brown | 1 | 63 | 3.61 |
Mann Benjamin | 2 | 2 | 0.36 |
Nick Ryder | 3 | 2 | 2.05 |
Subbiah Melanie | 4 | 2 | 0.36 |
Jared Kaplan | 5 | 21 | 1.61 |
John Schulman | 6 | 1806 | 66.95 |
Arvind Neelakantan | 7 | 408 | 17.77 |
Pranav Shyam | 8 | 3 | 1.39 |
Girish Sastry | 9 | 6 | 0.77 |
Askell Amanda | 10 | 2 | 0.36 |
Agarwal Sandhini | 11 | 2 | 0.36 |
Ariel Herbert-Voss | 12 | 3 | 1.06 |
Krueger Gretchen | 13 | 2 | 0.36 |
Henighan Tom | 14 | 2 | 0.36 |
Rewon Child | 15 | 38 | 3.79 |
Ramesh Aditya | 16 | 2 | 0.36 |
Daniel Ziegler | 17 | 55 | 2.99 |
Wu Jeffrey | 18 | 2 | 0.36 |
Winter Clemens | 19 | 2 | 0.36 |
Christopher Hesse | 20 | 6 | 1.09 |
Chen Mark | 21 | 2 | 0.36 |
Sigler Eric | 22 | 2 | 0.36 |
Litwin Mateusz | 23 | 2 | 0.36 |
Gray Scott | 24 | 2 | 0.36 |
Chess Benjamin | 25 | 2 | 0.36 |
Jack Clark | 26 | 6 | 0.80 |
Christopher Berner | 27 | 4 | 0.74 |
Sam McCandlish | 28 | 9 | 1.16 |
alec radford | 29 | 2165 | 75.60 |
Ilya Sutskever | 30 | 25814 | 1120.24 |
Dario Amodei | 31 | 455 | 17.92 |