English

Scaling Laws for Neural Language Models

An empirical study of how language model performance scales with model size, dataset size, and compute.

Scaling Laws for Neural Language Models

Abstract: We study empirical scaling laws for language model performance on the cross-entropy loss.

Introduction

Recent advances in deep learning suggest that model performance scales predictably with three key factors:

  1. Model size (parameters)
  2. Dataset size (tokens)
  3. Compute budget (FLOPs)

Methodology

We trained over 100 models ranging from 1M to 1B parameters.

Results

Our findings show power-law relationships across all three axes of scale.