We show that scaling up language models vastly improves their performance on a wide variety of NLP tasks. Our largest model, GPT-3, with 175 billion parameters, is 10x larger than any previous non-sparse language model. We find that GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, but still struggles on some tasks like commonsense reasoning. Crucially, GPT-3 achieves this performance in a "few-shot" setting—without any gradient updates or fine-tuning. It can adapt to new tasks simply by being given a few examples as a prompt, performing far better than previous models on difficult tasks like question answering and machine translation. We also discuss the broader societal impacts of our work, including potential risks and ethical considerations.
{
"id": "5a13c07e-9b4b-436f-a49b-d3767e338418",
"title": "Language Models are Few-Shot Learners (2020)",
"slug": "language-models-are-few-shot-learners",
"video_url": "https://www.youtube.com/watch?v=MbseGAyXoG4",
"url": "https://arxiv.org/abs/2005.14165",
"resource_category": "research",
"image_url": null,
"thumbnail_url": null
}