Scaling Laws for Neural Language Models
Abstract: We study empirical scaling laws for language model performance on the cross-entropy loss.
Introduction
Recent advances in deep learning suggest that model performance scales predictably with three key factors:
- Model size (parameters)
- Dataset size (tokens)
- Compute budget (FLOPs)
Methodology
We trained over 100 models ranging from 1M to 1B parameters.
Results
Our findings show power-law relationships across all three axes of scale.