Provided by

arXiv

arxiv.org
0.0/5
0 views
0 saved
FreeadvancedText Generation
ResearchersAI EngineersData Scientists
Introduces Chinchilla, demonstrating that compute-optimal LLMs are smaller and trained on more data than previously thought.

Overview

This foundational paper, Chinchilla, revolutionizes large language model (LLM) training by revealing that, for a given compute budget, optimal models are significantly smaller and require substantially more training data than the prevailing industry standard. Researchers found that scaling the dataset size alongside decreasing model size, exemplified by the 70B parameter Chinchilla model, leads to superior performance over much larger, but undertrained, models like Gopher (280B). The paper provides a crucial formula for determining optimal model size relative to FLOPs budget, challenging existing paradigms and promoting more efficient, high-performing LLM development.

Abstract

Large language models (LLMs) are an exciting area of research, but for the most part, attention has been focused on increasing model size, rather than making them more efficient. Here, we present a startling finding: for a given compute budget, the currently optimal models are significantly smaller than what largely determines the current state of the art. We find that the Chinchilla model, a 70B parameter model, uses the same compute budget as Gopher (280B), yet outperforms it and other state-of-the-art LLMs across a wide range of downstream evaluation tasks. To achieve this, we uniformly scale up the training dataset size while decreasing model size, a strategy that results in improved performance and enables us to better understand the relationship between scale, compute, and performance. We derive a simple formula to determine the optimal model size for a given FLOPs budget and demonstrate that this formula holds across 5 orders of magnitude. Our results indicate that the current paradigm for LLM development is significantly undertrained, leading to wasted compute and a skewed understanding of the capabilities of LLMs. By providing a more efficient training paradigm, we hope to accelerate progress in the field and enable more researchers to explore the exciting possibilities of LLMs.