Large language models (LLMs) have shown remarkable few-shot performance on a range of tasks. We explore how to unlock the reasoning abilities of LLMs via a simple prompting method called chain-of-thought prompting. We show that chain-of-thought prompting enables models to decompose multi-step problems into intermediate steps, which can be useful for solving problems that require reasoning. Experiments on mathematical reasoning (GSM8K, Math23K, AQuA-RAT), symbolic reasoning (Last Letter Concatenation, Coin Flip), and common sense reasoning (CSQA, StrategyQA, Sports Understanding) benchmarks demonstrate that chain-of-thought prompting improves the performance of LLMs on these tasks. For example, chain-of-thought prompting improves the performance of a 137B parameter model on GSM8K from 17.9% to 58.1%, on AQuA-RAT from 29.3% to 47.2%, and on StrategyQA from 59.8% to 75.3%. The improvements are especially noticeable for more complex reasoning tasks that require multiple steps. Our results suggest that chain-of-thought prompting may be a general method for improving the reasoning abilities of LLMs.
{
"id": "3f6930cc-8513-469f-b5ac-ebc0178d9eaf",
"title": "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (2022)",
"slug": "chain-of-thought-prompting-elicits-reasoning-in-large-language-models",
"video_url": "https://www.youtube.com/watch?v=JtvPbTffWrI",
"url": "https://arxiv.org/abs/2201.11903",
"resource_category": "research",
"image_url": null,
"thumbnail_url": null
}